Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!


Sponsored Content

 

 
 

Photo
- - - - -

Semantics


  • Please log in to reply
66 replies to this topic

#16 torka

torka

    Vintage Babe

  • Moderator
  • 4,392 posts
  • Location:Triangle area, NC, USA, Earth (usually)

Posted 19 January 2004 - 12:22 AM

Okay. I'm not going to say that I totally understand this stuff, because I'm certain there are nuances (and clearly some big honkin' crucial pieces as well) I haven't got yet, but I think I'm getting a handle on the theory, at least. I hope...

I've waded through the articles posted by Vijay and Grumpus. I even toddled over to the W3C Semantic Web Activity Statement and got a few paragraphs into that before my brain fell over. :zz: I'll have to save that and any other suggested reading for later.

Here's where my understanding is at this point. Tell me if I'm going off the track anywhere...

G's article appeared to concentrate on the theory -- how to use matrices and vectors to guage the "semantic proximity" of different words in a selection of documents in order to determine the degree of relationship among those documents. From that standpoint it actually made sense to me. I guess I retained more of that statistical analysis from my old accounting days than I thought. Either that, or I'm totally confused and just don't realize it. Which is entirely likely.

I'm just glad I don't have to work out the math behind it. I've got enough trouble with three dimensions, nevermind thousands.

From reading that article, one might come away with the idea that good ol' familiar HTML would do the trick. As I understood (or didn't understand, as the case may be) what was written, apparently all they need is access to the actual text, from which they can exclude their stop words, run their stemming, generate a few matrices, compress the data, et voilà -- document clusters! (Like peanut clusters but without the yummy chocolate coating... :lol: )

But Vijay's article was talking about pointers and RDF and whatnot, and what I can gather from the little of the W3C document I managed to absorb before brain failure, RDF and XML have something to do with each other.

Then I re-read G's analysis of Vijay's article (previous post in this thread) in which he points out that Google/Froogle is already extracting this semantic information from non-RDF documents and doing a pretty good job of it all things considered (even if they are trying to sell off a forum moderator, and pretty cheaply, too, I might add ;) ).

So, if I'm understanding things correctly... we don't have to convert everything over to XML (at least not right away) in order for this to work. Which is a good thing, because there are a buncha individuals and mom & pops out there (and some companies who ought to know better and could certainly afford the upgrade) who haven't even started using CSS and HTML4.x, much less XHTML or XML.

But we do have to use consistent page design and content structure, both within our own site and with other related sites in our "semantic neighborhood" if we want to increase the chances that our documents are accurately analyzed and rated.

And eventually, we're all going to have to learn XML, whether we like it or not. :)

So, now, how does one actually optimize RIGHT NOW (and in the next year or two) under such a scenario? Or am I jumping the gun by asking?

--Torka :lol:

#17 Scottie

Scottie

    Psycho Mom

  • Admin
  • 6,293 posts
  • Location:Columbia, SC

Posted 19 January 2004 - 12:27 AM

Many of these pages have things like:

buy [Scott Rahin's Martin guitar]. [Scott Rahin's Martin Guitar] is a Model 245. [Scott Rahin's Martin Guitar] rocks! You're stupid if you don't buy [Scott Rahin's Martin guitar]!

This is overly simplified, but you'll notice that in the original example, it identifies concepts through separation. "Scott" has "Guitar". "Guitar" is "model 245".

In the second example, Scott is never separated from his guitar, so it's "Scott Rahin's Martin Guitar" no matter how you cut it. You can't extrapolate any semantics from it because it's always used in the same way and as a complete phrase.

Ok Grumpus- :lol:

Based on the very interesting Middlebury.edu article you posted, the words relative to each other don't matter. All that matters is frequency and "uniqueness" in the indexing. The words are stripped to their stem version, counted and mapped, then normalized (density adjustments) and compared to pages with the same words.

If we believe that is the route Google has chosen, it would mean they would take the existing stores of indexed pages (which are lists of words), filtered the stop words out, stemmed the existing content words and then adjusted the algo to do some clustering/community analysis based on the similarity between pages.

Not AI, but an advanced form of analysis which wouldn't require RDF's or assumptions as to relationships. Content matching on a page content-by-page content basis as opposed to a word-by-word detailed matching. Seems doable.

But this theory would still reward keyword-stuffing, right? Possibly a keyword density threshold tweak in there. It could still work hand-in-hand with LocalRank...

Some interesting things to think about there. :learn: No matter how you slice it up, text is going to remain a key factor.

#18 Steve Sardell

Steve Sardell

    HR 5

  • Active Members
  • PipPipPipPipPip
  • 331 posts
  • Location:Hilton Head Island, SC, USA

Posted 19 January 2004 - 12:41 AM

Hi Scottie,

Glad to hear it was a little help. Applied Semantics and Kaltix were both acquired by G within a short time of one another,thus, causing confusion for many of us. The thought is; G is attempting to meld the two from which they hope to bring out localized search. From what I have gleaned it is a race.

Not sure if the below articles have previously been posted. The first is a white paper on A.S., the second from Scientific America is a much easier read .

Topic Sensitive Page Rank explained.

http://www2002.org/C.../127/index.html

http://www.scientifi...umber=1&catID=2

#19 Steve Sardell

Steve Sardell

    HR 5

  • Active Members
  • PipPipPipPipPip
  • 331 posts
  • Location:Hilton Head Island, SC, USA

Posted 19 January 2004 - 12:47 AM

Some interesting things to think about there. :learn: No matter how you slice it up, text is going to remain a key factor.

IMHO text will become much more important. SEOs are going to need to become well versed writers, and be able to make the page flow much more natuarally.

#20 powerofeyes

powerofeyes

    HR 7

  • Active Members
  • PipPipPipPipPipPipPip
  • 1,123 posts
  • Location:INDIA

Posted 19 January 2004 - 01:22 AM

Ok where is google right now with Semantics, No one can be sure what is really happening in google, But with some outward words from google we know Context matching is used in Adsense, Adsense a context matching targeted advertisement program formerly belongs to Applied Semantics now owned by google when google bought Semantics,
It can be more clearly elaborated by Andrew Goodman in his article in trafficks.com
Content targeting and Adsense

So we can get to some conclusion that google is using Semantics to some extend in Adsense, Recently I suggested in someother thread there is possibility google is using Semantics in Adwords too, I am not sure about that, but something told me there is a context matching happening in Adwords, So probably there might be some Semantics there,

Where else we can see visibly,
Ok how about the very good Spell check feature of google, How many of you have noticed it is not just a spell check feature but also it matches the spelling with the context of the query, Quote what google guy said,

The spellchecker is context-sensitive, so it's much more than just a set of suggested corrections.

Ok the big question is Does google is using Semantics in organic SERPs, I doubt it is already a bit there in Organic SERPs too, Many of us have noticed sites ranking without even having the keywords on the page or on the URL or on the anchor text coming into the site, How does this happen possibly because of some play of Semantics in it, In near future Keywords and phrases might replace concepts of the phrase, Search engines might become so intelligent that they just understand the meaning of the phrase and the motive of the user and just gives them the result without any sort of keyword matching on the site,
Phrases will become concepts from their process of crawling the web they will gather lots of different concepts dealing with the keyword or phrase and gives back results based on it, this is a far away thing but still it is very much possible in the near future,
Ok I like to quote a posting done by a User in Webmasterworld.com way back in April when google buys Applied Semantics was hot news, I cannot believe he made such a beautiful prediction,

Here is the quote,

But longer term, here's what you guys have been missing about this story. As semantic technology begins to make a stronger contribution to web search, SEO as formerly practiced becomes a quaint anachronism. You cannot "optimize" nearly as easy for ideas as you can for keywords. For example, you could keyword-optimize a title really well for a phrase, and find yourself ranking lower than some site which doesn't even contain those exact words anywhere! Let's say that on semantic grounds, those words are counted as matching the search query 81% as well as an exact match could be. At present, the lack of those words on even a high quality site would literally bring it to "0%" on that phrase, so the site wouldn't rank on that phrase no matter how good or topical the site was. Now (and by now I mean a couple-three years from now), clearly reputable sites on a given topic may begin beating out lower-quality "cleverly keyword-optimized" sites even on the keywords they've carefully optimized for and even if those exact keywords don't appear on the higher-quality site.


link to the quote: http://www.webmaster.../12127-1-75.htm
Ok how many of us are seeing his prediction is almost happening now in google, So it could be a possible venture of Semantics into organic SERPs,

Other interesting read on Semantics is this document, here they analyze the natural language from a phrase "Bill Kisses Hillary" ( LFG Semantics through constraints.
other theories which seem to work in google is Professer KlienBerg's theory on Determining a site as a hub or a Authority based on its contextual relationship,
ill post about it in my next posting,


VIJAY.

#21 Grumpus

Grumpus

    HR 6

  • Active Members
  • PipPipPipPipPipPip
  • 786 posts

Posted 19 January 2004 - 11:08 AM

I watch a couple of football games and do the "sleep" thing and look - the thread goes crazy. Cool. :) Good stuffs in here.

[quote name='"Scottie"']And, how do they determine trustworthiness of the sources? Are we back to PR ratings?[/quote]

Topic sensitive PR could do it for the sectors for which topics that are defined. LocalRank could play into it as well - though obviously the search would have to be made to generate the "term/localRank relationship". Google is also tracking clicks from time to time nowadays. Many speculate that this is going to give sites that get lots of traffic a boost in the ranks. I don't think so (well, it will, sort of). Those clicks (if the user doesn't go back to the same set of SERPs and click something else again) help to determine satisfaction. If I don't go back to the serps after visiting a site, I most likely got my answer, and therefore that site is a good source of information on that subject.

[quote name='"Scottie"']Heck, we've now got two pages on the web that appear to confirm that Ray Charles is love.[/quote]

If there are enough sites out there so that a pattern emerges, then yes, that assumption could be made. Chances are, though, that the word love is going to appear contextually before "Ray Charles" as in "I love Ray Charles' music" more often than anything else on those pages. Pages that actually have the example you gave are likely humor type sites and there will be no other mention of Ray Charles, so you're not going to get the confirming numbers needed.

The comparison of two documents just won't cut it. If I'm a baseball player and I play in two games and have a batting average of .400, it is a sign that I might be a good player. If you've played in 500 games and I have an average of .300, you're definitely a good player and even though it's less than .400, chances are, smart money is going to give you the advantage when we're going head to head in a batting derby.


In the LocalRank Arena:
[quote name='"Scottie"']Standard PR is used to pull the top 1000 (or whatever number they choose) results.[/quote]

The number is likely much lower in respects to LocalRank, but that's pretty much the gist of it. 1000 is the total number of documents in the set that are ranked, but since people are only going to look at the first couple of pages (as a general rule) LocalRank is likely calculated on a smaller number to save on resources. Once you get down below 100 or 250 or whatever, the bonus you get from this won't move you close enough to the top to make it worth doing the math anyway.

There are three steps to Google's caclulations on a query.

Step1: Google comes up with an unsorted list of relevant documents.
Step2: Google sorts these documents using PR, keyword existence/density, etc. This phase deals with mainly the "on-page" factors. It then takes the top 1000 documents (that number we know is true).
Step3: Offpage Factors kick in on the top X pages so that they can be resorted amongst themseleves - inbound link text, LocalRank, and all of the stuff that deals with the "relationships between pages".

[quote name='"Torka"']I'm just glad I don't have to work out the math behind it. I've got enough trouble with three dimensions, nevermind thousands.[/quote]

Amen to that! There probably even comes a point where too much knowledge/understanding of it is going to be detrimental as you'll end up second guessing yourself all the time. :D The important thing is to just have an understanding of the concepts and the potential of the technology. Looking around the forums and talking to my SEO pals since Florida update there is one definite finding that is constant.

Those of us who understood and had been employing tactics to accomodate Semantics, LocalRank, TSPR, and other "up and comming" elements related to those (all of which we've known about since late 2002 and early 2003) didn't report any major problems. Sure, maybe something lost a page or two in the SERPS, but the stuff either came back on its own or a minor adjustment fixed it. Dropping a page or two can happen with any update that employs an algo change - heck, that's what the algo change is designed to do - change the results.

Those who didn't know about these technologies or chose to put them in the "This isn't going to help me now, I'll worry about it when it happens" folder, had some sites that were relatively unaffected and some sites vanish off the face of the earth. (Most likely they ended up on Mars because it's warmer than Rochester.

[quote name='"Torka"']the W3C document I managed to absorb before brain failure, RDF and XML have something to do with each other.[/quote]

Think of RDF (Resource Description Framework) as the foundation. Think of XML (eXtensible Markup Language) as the formatting language used to deliver it.

HTTP (Hypertext Transfer Protocol) is to HTML (Hypertext Markup Language) as RDF is to XML.

[quote name='"Torka"']we don't have to convert everything over to XML (at least not right away) in order for this to work.[/quote]

Nope. We don't. And we probably won't. Though for specific applications (like that marketplace thing described in Vijay's article, Froogle, etc.) it makes sure that there are no errors, problems, bad assumptions. When it comes to specific applications, there is little room for error because folks are looking to perform a specific task and get 100% accurate results.

Searching the web is different. Because it's all encompassing and it delivers "everything" there is simply no way to deliver 100% accurate results - unless you were to require everyone provide a formatted feed - but then, you end up not delivering "everything" anymore.

(Wanna get rich? Start a search engine that uses an RDF feed from all its sources and bill it as an "accurate" and not a "comphrehensive" engine. Do it fast, though - everyone and their brother will be doing it soon!).

[quote name='"Torka"']So, now, how does one actually optimize RIGHT NOW (and in the next year or two) under such a scenario?[/quote]

See the above two answers. As far as the search engines go (at least with Google and Ink and the other players that are on the "search the entire web" track) it won't be critical to know XML. (At least I don't think so). Since Froogle started out (which is the best way to guage Google's success in this area) Google has made huge strides in improvement of extracting the proper information from a page.

We can also see AllTheWeb playing with this type of technology, also. If you do a search, you'll notice that they provide two snippets from the page. The first is the same type of snippet that Google provides - it just grabs the keywords and shows you the phrase(s) that is(are) near one instance of the term. Then you have the description section. These are a complete sentence that describes the content of the page (in most cases). These descriptions may or may not have the keywords in them. In some cases, they may be the DMOZ description of the site, but in most (since most "pages" don't have a DMOZ listing) it's extracted right from the page.

Example: AllTheWeb Search for "anteater biology". Except for the USGS Usage Stats page, the descriptions are summary sentences, there are no instances of the word "anteater" in them, and only 1 has the word "biology". Yet each provides an excellent description of what you are going to find on the page.

How did they do it? Well, they found the "element" on the page that, through semantical pattern recognition, best describes the overall topic of the page. In this case though, the semantics aren't applied as they relate to the search term, but as to the identifying overal focus of the page.

This is done using the DMOZ rdf dump as the "seed" but the end results are extrapolations from that.

Pretty neat stuff, huh?

[quote name='"Scottie"']Based on the very interesting Middlebury.edu article you posted, the words relative to each other don't matter. All that matters is frequency and "uniqueness" in the indexing. The words are stripped to their stem version, counted and mapped, then normalized (density adjustments) and compared to pages with the same words.[/quote]

Pretty much - but the foundation of the whole concept is based on being able to do this by extracting these "concepts" from "natural language". In highly competitive areas where the norm is to highly optimize a page and to focus on keywords rather than natural language - yes, keyword stuffed pages would get bonuses. In other less competitive sectors where SEO isn't done as much, natural language flows and it all happens normally.

So, this presents a problem for Google. Large batches of keyword rich pages all fighting over the same terms create an artificial phrase that doesn't normally exist in natural language. Semantics isn't capable of finding out what's real or artificial, so how can Google use this system to rank on natural occurances rather than artificial ones?

Well, I'll tell ya. :eek:

1.) CKDA (I just invented that acronym. It stands for Complex Keyword Density Analysis). In the olden days, keyword density was calculated based upon the number of times a keyword appeared in comparison to the total number of words on the page. Too low, and it's not going to rank. Too high, and you're stuffing so it's not going to rank. CKDA takes it further - it does keyword density in the title tag (the number of times the keyword appears in the title tag compared to all the words in the title tag). It does an anchor density analysis (number of times compared to all the different words within anchor tags on the page). It does the same thing with formatting tags (words that appear in <B><EM><H1>, etc compared to all words within those tags). And, even though they aren't really used for ranking - alt tags, title tags, meta tags, etc. could all be used for density analysis to determine if it's a naturally occuring instance of the word, or if it's a forced instance.

We know (or at least I know based upon enough observation to be able to consider it as fact in my own mind) that Google is now utilizing CKDA and that they might even be working with highly tightened ranges of acceptible density. They have seemingly loosened up the ranges a bit since Florida first hit. It's still in play, though.

2) Natural Language Identification. I can't be 100% certain that this is in use. We know that ATW (see my example above) has the capability of extracting a natural language description from a page - i.e. be able to say "This is a sentence". We can assume that Google has the technology to do this, as well. Now, as I say, they may not be fully utilizing it, but the possibility is there. Google wants to provide results based upon natural occurances, so it only makes sense that Google is (or will be soon) using something along these lines to make certain that at least a good part of a page has some natural language in it.

Obviously, pages have sections that don't use natural language like navigational elements and such so it's not looking for 100%. It may not even be looking for 50%. It might just say, "are there several sentences here?" I'm not sure.

3) Other things I haven't thought of.

So, when it comes down to a means of coming up with a way to ensure that keyword stuffed pages don't start creating "artificial" instances of phrases that will go into the database that handles the semantics, it merely runs a few checks before it gets there and if they pages don't qualify, they don't get to contribute to the information there.

As I say, and this is important, in some sectors that are all highly competitive and highly optimized, then there may not be enough pages that qualify to contribute to a valid "semantics database" relating to that sector. This is why it may appear that there are several algos running at Google. It's not that there are several algos, it's that there is a gaping hole in a key element of the ranking right now. Suddenly changing from a keyword rich page to one that is more natural isn't going to help you right now. It'll happen slowly, over time, as more pages begin to use the technique.

And remember, Google is famous for taking something it doesn't like and rather than just wiping everything out, they nudge it out of existence. It starts out where you lose a few spots, then a few updates later, you lose a few more. And pretty soon you're gone (if you haven't been slowly adapting to what they now want from you).

Right now, in most sectors, keyword richness still works. Over time (probably within the year) it'll work less and less well. You'll still need keywords, but they'll need to appear naturally and no stuffing allowed.

----

Vijay - I have to go actually get some work done today. I'll check out your links from your last post later this afternoon. Thanks for posting them!

G.

#22 Ruud

Ruud

    HR 4

  • Active Members
  • PipPipPipPip
  • 129 posts
  • Location:Rimouski, Canada (Quebec)

Posted 19 January 2004 - 03:21 PM

OK, this isn't as simple to wrap my head around as is plain ol' keyword use :-)

What it does explain to me (even though I don't completely understand the details) is WHY several of my pages that HAD been optimized for a specific search phrase kept ranking #1 in Google for that search phrase almost a year after that phrase has been removed. The topic of those pages is still the same but the search phrase is gone, yet Google kept (and keeps) assuming I'm the best resource for it. Interesting is that this already happened... mmm... before last Summer I would say.

I wonder how far this reaches. For instance, I have a site devoted to positive news. I've tried pointing that out in footers, on the front page, in side bar links, in links to about pages - and to my feeling Google seems to have no clue what my site is about. It still seems to treat each individual page as a seperate subject. It will send someone doing searches on autism over to a specific article but *not* on positive news autism or good news autism or even just plain news autism. But maybe it can't make out what it is about due to a lack of inbound links. The links that are incoming come from people around the web posting about a specific item they read somewhere.

Ruud

#23 Grumpus

Grumpus

    HR 6

  • Active Members
  • PipPipPipPipPipPip
  • 786 posts

Posted 19 January 2004 - 04:06 PM

Interesting is that this already happened... mmm... before last Summer I would say.


Parts of it have been around a while. LocalRank started (by my best estimations) in November or December of 2002. Inbound link text counting as something has been around too.

I also suspect that the separate "relational" database has been around for a good long while - even though it didn't have the semantics type things in it.

In regards to your "news" stuff, it's likely that through your other pages that Google has determined that each article page is a page that contains "news" on a specific subject. You rank well because the subject of the news is identifiable and the rest of your site shows that it is, in fact, "news".

"Good News" is a pretty broad range of topics, so it'll be hard for Google to home in on just what that means. Using words like "human interest stories" and other commonly used phrases to describe it might help. I'm not particularly strong in the areas of picking keywords - mainly because I don't know the language of each particular sector. Someone else may have better insight as to what to target for your broader based pages.

G.

#24 powerofeyes

powerofeyes

    HR 7

  • Active Members
  • PipPipPipPipPipPipPip
  • 1,123 posts
  • Location:INDIA

Posted 20 January 2004 - 05:06 AM

Ok, I said in my previous message google is using possible semantics or context sensitive in the Spell feature of google, The query when typed into google clearly tries to understand the motive of the google search user, They can clearly identify the meaning between "in" and "to" in a sentence,
As we know replacing "in" with "to" will make the sentence makes the sentence completely read in a different meaning, Does google understands this, Ofcourse they are doing it,

For example:
I searched for these phrases("car shpping to other countries" and "car shpping in other countries" ) with a spelling error as "shpping" for shopping and shipping.

For the first phrase, car shpping to other countries, I got the suggestion from google saying

Did you mean: car shipping to other countries
They suggested the word "shipping" for the word "shpping" similarly which is a perfect grammer it would be wrong if google would have suggested "shopping" in that place,

For the second phrase car shpping in other countries, I got the suggestion from google saying

Did you mean: car shopping in other countries
They suggested the word "shopping" for the word "shpping" similarly which is a perfect grammer it would be wrong if google would have suggested "shipping" in that place,
So google is well advanced and they can understand clearly the motive of the User, they can understand the natural language of the user which is really an excellent achievement is search technology, They do this possibly by the application of semantics,
I tried the same search in MSN, it cannot identify the meaning of the sentence itself, Allthweb suggests "shopping" for both the searches, Altavista suggests "shipping" for both the searches so google is the more accurate one in judging the meaning of the sentence and the motive of the user, they are reaching real heights in search technology,

Ok an other application which I see in current google is professor Klienberg theory, He divides the pages on the web as authorities and Hubs, I think we have discussed this in a brief in the google update thread,

An extract from the document:

According to Kleinberg there are two types of useful pages. An authority page is one that contains a lot of information about the topic. A hub page is one that contains a large number of links to pages containing information about the topic — an example of a hub page is a resource list on some specific topic. The basic principle here is the following mutually reinforcing relationship between hubs and authorities. A good hub page points to many good authority pages. A good authority page is pointed to by many good hub pages.     


Reference to the document: Automatic resource compilation by analyzing hyperlink structure and associated textweb.ethz.ch/WWW7/1898/com1898.htm
As far as I see there seems to be a mixture of hubs and authority pages on top of the results, these hubs and authority seems to totally interact with PageRank, semantics in ranking for a query, And some results it seems to be totally Hubs I dont know why it happens so,
Here is the original Klienberg document Authoritative sources in an hyperlinked environment
It explains graphs, authorities, hubs communities and a lot more very interesting document to read on,
As far I can see in future Natural language will take over keyword optimization and SEO is going to be a lot more challenging, those who can understand what is happening will definetely prosper in this field, I saw a wonderful thread in Creasite forums where some experts have shared their views on the future of SEO, I like to put a link to that thread since it nice to read, Is it time to get out of SEO business


thanks,
VIJAY,

#25 Ruud

Ruud

    HR 4

  • Active Members
  • PipPipPipPip
  • 129 posts
  • Location:Rimouski, Canada (Quebec)

Posted 20 January 2004 - 09:41 AM

They can clearly identify the meaning between "in" and "to" in a sentence,


Damned interesting. I wondered if it would maybe be a 'normal' spell checking feature but when tried in MS Word 2003 it suggests both shipping and shopping (in that order) for both.

In regards to your "news" stuff, it's likely that through your other pages that Google has determined that each article page is a page that contains "news" on a specific subject. You rank well because the subject of the news is identifiable and the rest of your site shows that it is, in fact, "news".


Alright... so, combining this with what I've read and with your CKDA idea... We need pages with the right keywords and keyword phrases - without them, however nice the concept of advanced semantics, Google has no clue what the page is really about and can only start guessing about it based on other content on the site, site structure and external links to it. Now having grasped the general topic of the page by doing a "normal" look at keyword frequency it hands the page over to another tool which tries to determine if the page makes sense. Do we have the term "digital photography" appearing with 20 times the word "freezer"? If so, then the term "digital photography" was probably used as an example or whatever but is not the actual topic of the page as "digital photography" and "freezer" are not close enough to eachother in meaning and subject. Which makes me understand a lot better what you and some others have been writing; that it is about the keywords and the related keywords.... I see... (having my coffee while typing and stuff is dawning on me, lol)...

So.... an article about "digital photography" where that term is repeated ad nauseum is likely less relevant than one where the phrase is surrounded with words such as "photos", "organise", "IPTC", "JPEG", "Photoshop", "resolution" etc.

...which is where the idea of Google "grasping" the concept of a page comes in... as it has these clusters of related terms it can shift directions and work the other way around - if needed/wanted. This page with a lot of talk about "photos", "organise", "IPTC", "JPEG", "Photoshop", "resolution" etc. is most probably about "digital photography" - and now this page can be served as a search result for that term even though the term itself would never ever occur on that page...

But that is (again!) damned interesting!

So yes... now I see... outbound links to relevant *pages*, not just to a related sites, re-enforce the probable meaning of the page.... Links to related on-site articles do the same.... It's a blog-o-sphere, lol. If Google would fully go this way Google-bombing should become a thing of the past unless it is done within the right content, the right context (does that mean the occurence of Google-bombing can indicate to what level Google 'understands' its pages?).

I think this is excellent stuff, excellent 'news'. And that apart from the search mechanism nothing seems to change then: it's all about the content, the content, the content. You can do things to re-enforce and promote your content but you can't simply establish your presence by repeating the <keyword> same old <keyword> time and <keyword> time again. Excellent!

Sorry, for some all this is either old news or rehashing what has already been said in this thread but it helps me to get things lined up in my mind :-)

Ruud

#26 Grumpus

Grumpus

    HR 6

  • Active Members
  • PipPipPipPipPipPip
  • 786 posts

Posted 20 January 2004 - 09:58 AM

By Jove, I Think He's Got It! :)

One thing to keep in mind:

and now this page can be served as a search result for that term even though the term itself would never ever occur on that page...


That's the ultimate goal, yes. But there are a few things to keep in mind. The first is that all this stuff is still in its infancy. As we can see in certain areas, it's far from perfect - but I still think it's a lot better and it's definitely a good path for google to be on in the long run.

The second thing to keep in mind is that this is, by no means, replacing any of the other ranking factors that have been in use (at least not yet). It may be enough to kick you up a page or two in the final results, but you've still got to use all the other SEO basics to get into the heap (and as high on the heap as possible before this kicks in).

Now that you've got a good grasp of how it is "supposed" to work, don't assume that it's the most important factor of ranking. It's important to understand, and as time goes on, I see the importance of it growing, but there's always a danger of focusing too much of your SEO energy on one or two things. (As we can see whenever there's a major update and those one or two things people have been focusing on suddenly take a huge hit in the "importance" curve - all these people start screaming, "Google Sucks!").

One thing that is nice about this technology is that it really takes a lot of the individual elements that have been around for a long time and ties them together or improves them.

G.

#27 Gobo

Gobo

    HR 2

  • Members
  • PipPip
  • 12 posts

Posted 20 January 2004 - 11:37 AM

I always thought that when google found a word that's not in their "dictionary", they'll do search in the background to see which possible alternatives for that word will give the most results, it seems to make more sense to me to do it that way than implying semantics algorithms to every typo they find.

#28 Grumpus

Grumpus

    HR 6

  • Active Members
  • PipPipPipPipPipPip
  • 786 posts

Posted 20 January 2004 - 12:28 PM

Semantics isn't really about typos. It's about learning that a "ford mustang" is a "car" and that a "wild mustang" is a "horse". It's about learning that "used computers" and that "computer user" are different things and that someone didn't spell "used" wrong in the second example.

Semantics can play a role in making the typing error detection and correction routines more accurate and effective,I suppose, but that's really a side effect rather than a part of the mission.

G.

#29 Ruud

Ruud

    HR 4

  • Active Members
  • PipPipPipPip
  • 129 posts
  • Location:Rimouski, Canada (Quebec)

Posted 20 January 2004 - 12:50 PM

...and the mission is to 'understand' the data (read: all web content). Currently there is one large database - it is a freeform database in the ultimate sense. It basically *just* holds data but what it means - nobody knows. The common database way to give all this data meaning would be to assign fields - but you can't really do that with the web. Not even every article header is the actual article title... So the first step is to make an ontology, a gigantic super-thesaurus if you will, which defines not synonyms but relations between words, terms and phrases. It is this 'thing that will be the treasure chest of each company, of each search engine, as with it its power to assign some formal meaning and structure to unstructured data stands or falls.

Ruud

#30 Grumpus

Grumpus

    HR 6

  • Active Members
  • PipPipPipPipPipPip
  • 786 posts

Posted 20 January 2004 - 01:10 PM

I understand this stuff pretty well. My weakness comes in when I try to express this picture I have in my mind into words on a page. Ruud - you've done your reading and are far better at concisely expressing your understanding than I. I'm going to start defaulting to letting you explain this stuff now that you understand it. :lmao:

G.

P.S. I'm rather ecstatic right now - it makes me feel useful when someone can take a topic such as this that I believe to be important (either right now or in the future) and actually get a good grasp on it. The moment of clarity that you had over the past day or so is fairly rare for me to see - due in no small part, no doubt, to my lack of skill in expressing the clutter of crap in my mind in a concise and clear way. So, when someone does "get" it, I often find myself feeling a bit giddy. Thanks for putting the effort in on this and I hope it was worth your time in the long run!




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users