Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!


Sponsored Content

 

 
 

Photo
- - - - -

Term Weight Formula


  • Please log in to reply
21 replies to this topic

#1 T Bill

T Bill

    HR 4

  • Active Members
  • PipPipPipPip
  • 107 posts
  • Location:Mountainair, New Mexico USA

Posted 05 February 2005 - 02:47 PM

If nobody minds, I'd like to go back to a post Dan Thies had about Dr. Garcia's formula. It was a different thread and the thread had changed course so I posted this as a new one.

First - Thanks Dan! This is great stuff.

As Dan says, this changes the way we should be looking at keyword density.

I think a very important bit of information is this:

"Thus, terms which appear in too many documents (e.g., stopwords, very frequent terms) receive a low weight, while uncommon terms which appear in few documents and receive a high weight."

Heavy.

Now, my site sells t-shirts and I have "t-shirt" on every page in every document more often than the word "the". We're probably killing ourselves for that search term but it is definitely good for the real visitors.

Hmmm. Can I offer 100's of variations of "100% Cotton Superman Knit Garment with a Tubular Body and Short Sleeves resembling the letter "T" instead of just "100% Cotton Superman T-Shirts"? I'd reserve the word "T-Shirts" for only a few popular items. What does this do to the visitor experience?

I'm not going to drop the words use altogether but we will definitely be reducing it's use.

Obviously, this is much more important for keyword phrases.

-----------------------------------

OK. This is a great formula but it must be for competitive intelligence right? If I run this formula on my site I will get a Term Weight but it is not terribly useful unless I have a target Term Weight. I assume this target Term Weight is one that comes from running the formula on a competitors site that comes up #1 - Right?

To run this formula on a competitors site that is ranking well, it would be very valuable knowledge. So I have a few questions.

1. Can you tell me what Dr. Garcia means by "documents in a database"?. Since we are talking about html pages, would this be documents within a directory?

2. How can I tell how many documents someone else has in their "database"?

3. How can I search only those documents to get the number of occurrences of a keyword phrase in each document?

- Or... is there a universal good Term Weight? That would save us all a lot of time.

At first, this is complicated, but on second look, it's not so tough as long as you have a calculator that handles Logs.

However, I have still not come up with a solid number.

So, Term Weight should equal



Or...

Term Weight = number of times a term occurs in a document multiplied by: the Log of: Number of documents in the database divided by Number of documents containing the term.

With answers to these questions, our copywriting could be very effective.

Dan, anyone? I'm interested in finding out if I am on base or not.

Here is the original link Dan submitted:
tp://www.miislita.com/term-vector/term-vector-1.html

Thanks.

#2 T Bill

T Bill

    HR 4

  • Active Members
  • PipPipPipPip
  • 107 posts
  • Location:Mountainair, New Mexico USA

Posted 06 February 2005 - 06:15 AM

There must be also be a weight applied to a term that is found on every page as well. If it is not a general term, wouldn't that term be what the site is about? How do you sell t-shirts without having that word a number of times on every page of the site?

Am I missing something? There must be another formula as well. Because a site has a subject, it is about something. Whatever that something is will be a theme throughout the site. Variations on that theme should make specific pages unique within the site and I can see the unique pages getting higher weight for that term but wouldn't the site need to be ranked high simply because that theme is found on so many pages? It seems to me that this is why Google will show two results from the same site within a search.

So, there is the weight of the term throughout a site and then there is the weight of unique terms within the site. It would also seem logical to look at a term that is found throughout a site and give a page, or pages, that used a variation of the original term much more weight in searches for that term variation.

So, there must be two formulas, at least.

Any ideas?

#3 Jill

Jill

    High Rankings Advisor

  • Admin
  • 32,324 posts

Posted 06 February 2005 - 10:56 AM

No ideas here. You might as well be speaking Greek as far as I'm concerned as none of what you said has anything to do with the work I do. smile.gif

#4 DanThies

DanThies

    Keyword Super Freak

  • Moderator
  • 865 posts
  • Location:Texas, y'all

Posted 06 February 2005 - 04:24 PM

I hate to point someone at another forum, but Dr. Garcia doesn't post here. He does post at the Search Engine Watch forums, as Orion. Just about everything you're asking here has already been discussed there.

You may be reading too much into this, though. The main point I was trying to get across is that "keyword density" is not part of the algorithm that retrieves search results.

I'll address a couple questions to clarify...

The number of documents in a database refers to how many documents exist in the search engine's database. Google has 8 billion documents in their database, although they may not actually have them all indexed.

The term weight refers to how important that specific term is in retrieving results for a specific query - different queries will be different because they are different, so there is no magic formula for writing a perfect web page.

When Jill says it has nothing to do with the work she does, she's right in one sense, because she just uses natural copywriting. On the other hand, what this stuff really means is that natural copywriting is the logical approach... so in that sense it has everything to do with the work she does.

Wow. I'm having another Zen moment, my conscious mind is breaking. What is the sound of two hands clapping? appl.gif

#5 sonnyyu

sonnyyu

    HR 4

  • Active Members
  • PipPipPipPip
  • 157 posts

Posted 06 February 2005 - 05:24 PM

Here we go appl.gif

#6 Jill

Jill

    High Rankings Advisor

  • Admin
  • 32,324 posts

Posted 06 February 2005 - 05:35 PM

My point was that you don't really need to know that stuff to do your job, and in fact, knowing it may end up doing you more harm than good becuase you would be working toward the formulas instead of just creating great web pages naturally!

Jill

#7 copywriter

copywriter

    HR 7

  • Moderator
  • 1,805 posts
  • Location:South Carolina, USA

Posted 07 February 2005 - 07:07 PM

QUOTE
there is no magic formula for writing a perfect web page.


Agreed. thumbup1.gif

TBill, it's not as hard as you're making it. It also isn't the same for every page/industry/keyphrase.

Read this if you haven't already. It may help.

#8 randfish

randfish

    Daily SEO Show Anchor

  • Active Members
  • PipPipPipPip
  • 229 posts
  • Location:Seattle, WA

Posted 07 February 2005 - 07:40 PM

I have to disagree somewhat. I believe it's not only important, but critical to understand these elements and what they mean in order to be a great SEO. TF * IDF is a concept that refers to the term frequency times the inverse document frequency.

Basically, keyword density is not measured by IR systems (including search engines). Instead, search engines compare the frequency of occurence of a particular term or phrase, normalized against the total number of terms in the document, then compare that to the number of documents in the database (index) and those that contain the term. It does sound complex, but the equations are simple and very informative:

There's two I use consistently:

Classic Normalized Term Weight uses the following equation:

Wi = tfdi / max tfdi * log (D/dfi)

Where:
tfdi = term (or phrase of a given length) frequency in document
tfdi = maximum frequency of any (same number word) phrase in document
D = number of documents in the database (when using Google, I estimate at 8.1 billion)
dfi = number of documents containing the term/phrase (# of results for a search in quotes)

A second equation, Glasgow Weight, can also be useful (I generally use both when analyzing my own site vs. the competition):

Wij = log(freqij + 1) / log(lengthj) * log (N/ni)

Where:
freqij = frequency of term i (a word or phrase of a given length) in document j
lengthj = number of unique terms (word or phrase of the same length) in document j
N = number of documents in database (again, I use 8.1 billion for Google)
ni - number of documents containing the term (results of a search in quotes)

What these are telling us is that keyword stuffing and long pages with many keywords are not the answer to page optimization. Given the normalization process and the simplicity of these equations, it's pretty easy to check your competition (I'd recommend analyzing the top 20 pages) and then your own page.

#9 tempy

tempy

    HR 5

  • Active Members
  • PipPipPipPipPip
  • 344 posts

Posted 08 February 2005 - 06:34 AM

I also don't understand those formulae. But as a graduate of linguistics (amongst other things - but let's not go there) I also have to question the term 'natural copywriting'. There really is no such thing.

I am fully willing to accept that the forms of copywriting that Jill and the others here works wonders and is probably the best consistent method of achieving SEO results. But it is still targeted at achieving a succesful ranking - even if that is an alleged by-product of producing a well-written, readable web site - and whether we are conscious of it or not. Writing for the web has always been different from writing for the printed medium. Neither are 'natural' and have evolved in ways that are far from natural. In fact the English written word has often been specifically engineered for different circumstances and communication mediums. Google is one such medium.

#10 copywriter

copywriter

    HR 7

  • Moderator
  • 1,805 posts
  • Location:South Carolina, USA

Posted 08 February 2005 - 04:21 PM

Randfish, is what you're saying that using either of these formulas, you can take the exact same steps toward optimizing a site that wants to rank highly for the phrase "stapler parts" and one that wants to rank highly for "website design" and give them both grand results? I'm not being sarcastic here, I am truly curious.

QUOTE
have to question the term 'natural copywriting'. There really is no such thing.


Sure there is. The phrase "natural copywriting" was created (by whom, I don't know) to represent SEO copywriting that isn't forced. It flows... it's difficult or impossible to find the keyphrases within the copy... it seems unencumbered. "Natural copywriting" is a concocted term just like "search engine copywriting." Until Jill invented it, there was no such thing smile.gif

#11 sonnyyu

sonnyyu

    HR 4

  • Active Members
  • PipPipPipPip
  • 157 posts

Posted 08 February 2005 - 05:20 PM

Bad news is Google support total 103 languages now, for people only do English natural copywriting that mean 0.97% natural copywriting under the sun. kicking.gif

#12 randfish

randfish

    Daily SEO Show Anchor

  • Active Members
  • PipPipPipPip
  • 229 posts
  • Location:Seattle, WA

Posted 08 February 2005 - 05:29 PM

QUOTE(copywriter @ Feb 8 2005, 05:21 PM)
Randfish, is what you're saying that using either of these formulas, you can take the exact same steps toward optimizing a site that wants to rank highly for the phrase "stapler parts" and one that wants to rank highly for "website design" and  give them both grand results?
copywriter,
This is not a 'magic formula' for writing web pages. This is the application of scientific methodology to search engine optimization. What I'm suggesting is that rather than just using 'natural copywriting', which I believe is great and should be the first step, you also use these formulas.

Let me lay out a scenario. You want to write about a new Diet Fanta Soda. You want to rank well for the phrase - diet fanta. My recommendation would be to follow the steps below:

1. Write a page of great copy that is usable (broken up into smaller paragraphs, uses bullet points, etc.) and readable and convincing.
2. Analyze the top 20 sites ranking for the term 'diet fanta' at Google (I would use Yahoo! & MSN as well).
3. Conduct an analysis of the term weight - using both formulas above - of the phrase 'diet fanta' for each ranking page.
4. Compare the results and make sure that your own page's term weight for 'diet fanta' similiar to the top results (Unlike with KW Density, I usually go a little higher than the competition).
5. Edit your copy to comply with this formula.
6. Start this analysis for related words, etc.

Search engines will always rely on math and equations for ranking. There is no reason why a professional optimization specialist should not be armed with the same knowledge and apply it.

#13 DanThies

DanThies

    Keyword Super Freak

  • Moderator
  • 865 posts
  • Location:Texas, y'all

Posted 08 February 2005 - 06:49 PM

QUOTE(randfish @ Feb 8 2005, 04:29 PM)
copywriter,
This is not a 'magic formula' for writing web pages. This is the application of scientific methodology to search engine optimization. What I'm suggesting is that rather than just using 'natural copywriting', which I believe is great and should be the first step, you also use these formulas.

...

Search engines will always rely on math and equations for ranking. There is no reason why a professional optimization specialist should not be armed with the same knowledge and apply it.
View Post


huh.gif Yes, search engines run on math, but there are a lot of assumptions loaded in there. You aren't really armed with the same knowledge, are you?

For one thing, when you graph out the top 20 pages for density, term weight, or whatever, it's going to be a scattergraph. There's too much happening off the page. Search engines aren't using averages - that's the mistake everyone makes in thinking about keyword density.

I suppose you could add a "-inanchor:diet -inanchor:fanta" to try to eliminate some of the off page factors from the set of pages, but you're still taking a shot in the dark - may as well just use the top ranked page as a template and replace their words with your own.

If you don't know how different types of hits (heading, bold, hyperlink, etc.) are weighted in the algorithm then you don't know what you've really got there anyway. Since this part of the algorithm may very well be query-dependent, you may as well rely on averages as any other guess.

Show me good natural copy that doesn't fall within the range on density, term weight, or whatever vs. other ranked pages.

#14 Jill

Jill

    High Rankings Advisor

  • Admin
  • 32,324 posts

Posted 08 February 2005 - 07:11 PM

Aww Randfish...you take all the fun out of things!

No wonder we're all getting so bored with SEO!

#15 randfish

randfish

    Daily SEO Show Anchor

  • Active Members
  • PipPipPipPip
  • 229 posts
  • Location:Seattle, WA

Posted 08 February 2005 - 07:23 PM

Dan,

If I hear you right, you're suggesting that the use of term weight, or any other calculation of a statistical formula for that matter, is not useful unless you know precisely how the SEs are weighting them in the overall algo (and which specific pieces get +/-).

I'd have to disagree. I think that using formulas (even approximated, incomplete ones) like term weight and watching where the data falls, especially if done over a large number of queries, can result in better optimization techniques.

Don't get me wrong, I know that this is a guess. I know that the math being used here is not 100% accurate. I even know that the major SEs probably do not use the two formulas above exactly. But, I also know that this type of analysis can be valuable. It's one more tool in the toolbox and I would recommend taking advantage of it, given my current level of knowledge.

Maybe, Dan, you can tell us why, at your level of knowledge, you don't use these (and possibly even hint at what you do use smile.gif).




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users