High Rankings Search Engine Optimization ForumHigh Rankings Advisor Search Marketing Newsletter

Welcome Guest ( Log In | Register )

Important Announcement: ***Need an Affordable SEO Website Review?***
 
Reply to this topicStart new topic
> Linearization, Tokenization ...., Question ....
webkid_san
post Apr 6 2005, 08:52 PM
Post #1


HR 4
****

Group: Active Members
Posts: 282
Joined: 21-September 04
User's local time:
Feb 9 2010, 03:36 PM
Member No.: 5,136



Please excuse my ignorance ... I was going over some forum and there were people talking about - linearization, tokenization, filtration, and stemming - and I couldnt understand it at all ... I was wondering if some body can guide me as to where I can possibly read to understand the concept on - linearization, tokenization, filtration, and stemming ...

thanks
Go to the top of the page
 
+Quote Post
Jill
post Apr 6 2005, 11:26 PM
Post #2


High Rankings Advisor
Group Icon

Group: Admin
Posts: 29,201
Joined: 21-July 03
User's local time:
Feb 9 2010, 02:36 PM
From: Ashland, MA
Member No.: 2



Never heard of them, you may want to ask over at the forum where you read about it.

I get a huge kick out of those who want to make SEO more complicated than it need be. Makes me (IMG:http://www.highrankings.com/forum/style_emoticons/default/lol.gif)
Go to the top of the page
 
+Quote Post
Michael Martinez
post Apr 7 2005, 01:59 AM
Post #3


HR 8
********

Group: Active Members
Posts: 3,718
Joined: 5-April 05
User's local time:
Feb 9 2010, 11:36 AM
From: Seattle, WA
Member No.: 7,091



QUOTE(webkid_san @ Apr 6 2005, 08:52 PM)
Please excuse my ignorance ... I was going over some forum and there were people talking about -  linearization, tokenization, filtration, and stemming - and I couldnt understand it at all ... I was wondering if some body can guide me as to where I can possibly read to understand the concept on -  linearization, tokenization, filtration, and stemming ...


"linearization" is a mathematical term and it can mean several things. It depends on the context. I suspect you stumbled across a discussion of document indexing methodology. There is someone who appears to be establishing himself as an SEO authority (in order to launch his Web optimization business) in a large way by lecturing people about document indexing methods.

In layman's terms, I would describe a linearizing process as one which organizes data into a vector -- a collection of data elements which are represented by a simple array like (a,b,c,d,e). You can perform certain kinds of operations on the vectors, and they produce mathematically significant results which can lead you into Boolean algebra, set theory, and dealing with a lot of 1s and 0s.

Tokenization comes from computer science and it refers to simplifying a data set by replacing complex data structures (such as words) with simpler ones (such as numbers). In short, you create a cypher over a set of words or expressions where the words or expressions are represented by numbers (the tokens).

I have seen people discuss tokenization in two contexts with respect to search engines. Pages or documents can be tokenized (assigned a unique identifier, an ID number) and words within documents can be tokenized (also assigned a unique identifier, but one which is used in a different context from the page identifier).

The word tokens allow for more efficient document storage, content indexing, sorting, and querying.

Filtration is just what it sounds like. A filtering process is defined and applied to a data set in order to weed out or filter out undesirable data elements. Search engines strive to filter out spam pages, dead pages, duplicate pages, and irrelevant pages at various stages in their functions.

Stemming refers to reducing long, complex words to their roots. In effect, you take a word like "stemming" and equate it with "stem", a word like "difficulty" and equate it with "difficult", etc.

Stemming allows you to work with a smaller token set. It is also believed (and probably mathematically proven in some arcane way) to help qualify word collections -- that is, you can associate related words by trimming as much excess baggage off them and restoring them to their root or near-root values.

running is related to runabout is related to runup is related to runover etc.

I don't see much value in these academic discussions on the forums. Most people don't understand the exotic disciplines involved, the participants have to rely upon a great deal of technical jargon (although they do make attempts to explain some of it), and the math is usually well beyond the average SEO's comprehension or point of caring.

I am not going to pretend to be able to absorb it all myself. But I've had enough computer science, set theory, vector algebra, and matrix algebra to know where they are coming from. I just don't care to end up where they think they are going.
Go to the top of the page
 
+Quote Post
lyn
post Apr 7 2005, 06:54 AM
Post #4


HR 6
******

Group: Active Members
Posts: 940
Joined: 28-April 04
User's local time:
Feb 9 2010, 03:36 PM
From: London, Ontario
Member No.: 3,389



Excellent post, Martin.
I not only understood your explanation of the terms and where they fit in search engines tech, I also understood why, like you, I have no need to care! (IMG:http://www.highrankings.com/forum/style_emoticons/default/biggrin.gif)

Being a word guy, though, I do appreciate the definitions. Thanks.

L.
Go to the top of the page
 
+Quote Post
randfish
post Apr 7 2005, 01:15 PM
Post #5


Daily SEO Show Anchor
****

Group: Active Members
Posts: 229
Joined: 21-July 04
User's local time:
Feb 9 2010, 03:36 PM
From: Seattle, WA
Member No.: 4,442



webkid - I use filtration and stemming all the time in designing tools for SEO and analyzing my own pages. As Jill says, most SEOs do not care or worry about it and success can certainly be achieved without these techniques.

However, for those of us, like myself, who are fascinated by how the search engines actually work and enjoy optimizing from that frame of mind - these concepts are invaluable.

Stemming is used in the actual analysis of a given document so that the most important concepts can be extracted. This is a method search engines use to help them find the topic of a particular site/page.

Filtration is used in all sorts of things, but the first example I always think of is its use in document analysis to remove stopwords.

I think it's over-simplifying to say that these concepts are meaningless. TO me, it's a critical part of my profession to understand everything I possibly can about the search engines' technology and processes. This doesn't mean that you can't do without this knowledge, but it's a very good feeling when I see changes in the SERPs and can reason out and narrow down to a few possibilities. I just don't like the feeling of being in the dark on the whole thing.

Dr. Garcia's work (the source of your inquiry) has been invaluable to many SEOs in the business, but as you can see, there are people on both sides of the fence. If you do want to read more on the subject, visit his website - www.miislita.com
Go to the top of the page
 
+Quote Post
Jill
post Apr 7 2005, 02:27 PM
Post #6


High Rankings Advisor
Group Icon

Group: Admin
Posts: 29,201
Joined: 21-July 03
User's local time:
Feb 9 2010, 02:36 PM
From: Ashland, MA
Member No.: 2



I actually do know about stemming and filtration, although I prefer to discuss them in terms of my flowers and my tap water. (IMG:http://www.highrankings.com/forum/style_emoticons/default/wink.gif)

The other words though, I've not heard in regards to SEO.
Go to the top of the page
 
+Quote Post
Michael Martinez
post Apr 7 2005, 02:37 PM
Post #7


HR 8
********

Group: Active Members
Posts: 3,718
Joined: 5-April 05
User's local time:
Feb 9 2010, 11:36 AM
From: Seattle, WA
Member No.: 7,091



QUOTE(Jill @ Apr 7 2005, 02:27 PM)
I actually do know about stemming and filtration, although I prefer to discuss them in terms of my flowers and my tap water. (IMG:http://www.highrankings.com/forum/style_emoticons/default/wink.gif)

The other words though, I've not heard in regards to SEO.


The jury is still out (at least, MINE is) on just how useful it is to understand document analysis for SEO. Most good SEOs seem to understand that CSS is preferable to table formatting, links help, using indexable human-readable text on pages is good, etc., etc.

Some folks just go in for the esoteric stuff.

And then some of us like to second-guess the search engines, although I don't like spending a lot of time talking about vector analysis.

Some SEO sites do use these buzzwords in their sales pitches. Whatever floats your boat, I guess.
Go to the top of the page
 
+Quote Post
Winooski
post Apr 8 2005, 12:57 PM
Post #8


HR 1
*

Group: Members
Posts: 3
Joined: 25-March 05
User's local time:
Feb 9 2010, 03:36 PM
From: Northeast USA
Member No.: 6,994



BTW, if anyone's interested, the complete Garcia article is at www.e-marketing-news.co.uk/Mar05/garcia.html. It's actually a good read and (I believe) answers all of webkid_san's questions.

The context for Dr. Garcia's use of those terms is to bust the myth of the importance of arriving at specific keyword density metrics for optimization.

This post has been edited by Winooski: Apr 8 2005, 01:06 PM
Go to the top of the page
 
+Quote Post
Raphael
post Apr 8 2005, 01:32 PM
Post #9


The Limey Cowboy
******

Group: Active Members
Posts: 722
Joined: 17-December 04
User's local time:
Feb 9 2010, 03:36 PM
From: New England
Member No.: 5,984



Personally, I figure it's all voodoo and witch craft, and I'm off to sacrifice a chicken to obtain good rankings in Google for my next site. =)
Go to the top of the page
 
+Quote Post
torka
post Apr 8 2005, 02:03 PM
Post #10


Vintage Babe
Group Icon

Group: Moderator
Posts: 4,142
Joined: 31-July 03
User's local time:
Feb 9 2010, 02:36 PM
From: Triangle area, NC, USA, Earth (usually)
Member No.: 89



Me, too!

Well, technically, we're actually more like "grilling" than "sacrificing". (IMG:http://www.highrankings.com/forum/style_emoticons/default/chef.gif)

And, okay, it's (IMG:http://www.highrankings.com/forum/style_emoticons/default/whopper.gif) , not (IMG:http://www.highrankings.com/forum/style_emoticons/default/chicken.jpg)

But the idea is the same! #1 on Google, here we come! (IMG:http://www.highrankings.com/forum/style_emoticons/default/eat.gif) (IMG:http://www.highrankings.com/forum/style_emoticons/default/cheers.gif)

--Torka (IMG:http://www.highrankings.com/forum/style_emoticons/default/mf_prop.gif)
Go to the top of the page
 
+Quote Post

  
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



This forum is sponsored by High Rankings, a Boston SEO Agency
- Lo-Fi Version Time is now: 9th February 2010 - 02:36 PM