Are you a Google Analytics enthusiast?
Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE!

www.CustomReportSharing.com
From the folks who brought you High Rankings!
More SEO Content
Linearization, Tokenization ....
#1
Posted 06 April 2005 - 08:52 PM
thanks
#2
Posted 06 April 2005 - 11:26 PM
I get a huge kick out of those who want to make SEO more complicated than it need be. Makes me
#3
Posted 07 April 2005 - 01:59 AM
"linearization" is a mathematical term and it can mean several things. It depends on the context. I suspect you stumbled across a discussion of document indexing methodology. There is someone who appears to be establishing himself as an SEO authority (in order to launch his Web optimization business) in a large way by lecturing people about document indexing methods.
In layman's terms, I would describe a linearizing process as one which organizes data into a vector -- a collection of data elements which are represented by a simple array like (a,b,c,d,e). You can perform certain kinds of operations on the vectors, and they produce mathematically significant results which can lead you into Boolean algebra, set theory, and dealing with a lot of 1s and 0s.
Tokenization comes from computer science and it refers to simplifying a data set by replacing complex data structures (such as words) with simpler ones (such as numbers). In short, you create a cypher over a set of words or expressions where the words or expressions are represented by numbers (the tokens).
I have seen people discuss tokenization in two contexts with respect to search engines. Pages or documents can be tokenized (assigned a unique identifier, an ID number) and words within documents can be tokenized (also assigned a unique identifier, but one which is used in a different context from the page identifier).
The word tokens allow for more efficient document storage, content indexing, sorting, and querying.
Filtration is just what it sounds like. A filtering process is defined and applied to a data set in order to weed out or filter out undesirable data elements. Search engines strive to filter out spam pages, dead pages, duplicate pages, and irrelevant pages at various stages in their functions.
Stemming refers to reducing long, complex words to their roots. In effect, you take a word like "stemming" and equate it with "stem", a word like "difficulty" and equate it with "difficult", etc.
Stemming allows you to work with a smaller token set. It is also believed (and probably mathematically proven in some arcane way) to help qualify word collections -- that is, you can associate related words by trimming as much excess baggage off them and restoring them to their root or near-root values.
running is related to runabout is related to runup is related to runover etc.
I don't see much value in these academic discussions on the forums. Most people don't understand the exotic disciplines involved, the participants have to rely upon a great deal of technical jargon (although they do make attempts to explain some of it), and the math is usually well beyond the average SEO's comprehension or point of caring.
I am not going to pretend to be able to absorb it all myself. But I've had enough computer science, set theory, vector algebra, and matrix algebra to know where they are coming from. I just don't care to end up where they think they are going.
#4
Posted 07 April 2005 - 06:54 AM
I not only understood your explanation of the terms and where they fit in search engines tech, I also understood why, like you, I have no need to care!
Being a word guy, though, I do appreciate the definitions. Thanks.
L.
#5
Posted 07 April 2005 - 01:15 PM
However, for those of us, like myself, who are fascinated by how the search engines actually work and enjoy optimizing from that frame of mind - these concepts are invaluable.
Stemming is used in the actual analysis of a given document so that the most important concepts can be extracted. This is a method search engines use to help them find the topic of a particular site/page.
Filtration is used in all sorts of things, but the first example I always think of is its use in document analysis to remove stopwords.
I think it's over-simplifying to say that these concepts are meaningless. TO me, it's a critical part of my profession to understand everything I possibly can about the search engines' technology and processes. This doesn't mean that you can't do without this knowledge, but it's a very good feeling when I see changes in the SERPs and can reason out and narrow down to a few possibilities. I just don't like the feeling of being in the dark on the whole thing.
Dr. Garcia's work (the source of your inquiry) has been invaluable to many SEOs in the business, but as you can see, there are people on both sides of the fence. If you do want to read more on the subject, visit his website - www.miislita.com
#6
Posted 07 April 2005 - 02:27 PM
The other words though, I've not heard in regards to SEO.
#7
Posted 07 April 2005 - 02:37 PM
The other words though, I've not heard in regards to SEO.
The jury is still out (at least, MINE is) on just how useful it is to understand document analysis for SEO. Most good SEOs seem to understand that CSS is preferable to table formatting, links help, using indexable human-readable text on pages is good, etc., etc.
Some folks just go in for the esoteric stuff.
And then some of us like to second-guess the search engines, although I don't like spending a lot of time talking about vector analysis.
Some SEO sites do use these buzzwords in their sales pitches. Whatever floats your boat, I guess.
#8
Posted 08 April 2005 - 12:57 PM
The context for Dr. Garcia's use of those terms is to bust the myth of the importance of arriving at specific keyword density metrics for optimization.
Edited by Winooski, 08 April 2005 - 01:06 PM.
#9
Posted 08 April 2005 - 01:32 PM
#10
Posted 08 April 2005 - 02:03 PM
Well, technically, we're actually more like "grilling" than "sacrificing".
And, okay, it's
But the idea is the same! #1 on Google, here we come!
--Torka
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users








