Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!


Sponsored Content

 

 
 

Photo
- - - - -

Semantics


  • Please log in to reply
66 replies to this topic

#1 Paul J

Paul J

    HR 4

  • Active Members
  • PipPipPipPip
  • 141 posts
  • Location:Minneapolis, MN

Posted 17 January 2004 - 10:21 PM

Does anyone have any links or articles about semantics and how Google and other SE's are measuring its importance?

I've seen a few threads within the last month of how Google is starting to place heavier importance on semantics. It might of been Jill that gave a short description of "copy that converts". That makes sense - long term success.

When I first saw it, I was a bit puzzled. Wouldn't Google with such a complex algo already have placed heavy importance on it for rankings? Or, are they now just able to do it?

Any info is fantastic.

Paul

#2 Grumpus

Grumpus

    HR 6

  • Active Members
  • PipPipPipPipPipPip
  • 786 posts

Posted 17 January 2004 - 11:31 PM

This paper will start out making you wish you hadn't asked this question. It's a paper that describes the processes that were going on over at Applied Semantics before Google bought them up. It's long and you probably won't even be able to get it in a single read - pretty heavy stuff in there. If you can muscle your way through and then go back to the stuff you aren't clear on, you'll have an excellent foundation for understanding how all this works.

http://javelina.cet..../cover_page.htm

This paper doesn't deal specifically with Google, but is an overview of the entire thing (though it does explain some potential uses for it - a search engine included). Once you wrap your mind around the concepts, you can almost look at Google and see it happening.

Also bear in mind that this is only part of it. There's Topic Sensitive PageRank and LocalRank that are also somewhat new to the game (within the last year) and vastly overlooked (I suppose due to the complexities of it all) by most. With the semantics kicking in, the levels of integration of these two are also boosted considerably.

G.

#3 powerofeyes

powerofeyes

    HR 7

  • Active Members
  • PipPipPipPipPipPipPip
  • 1,123 posts
  • Location:INDIA

Posted 17 January 2004 - 11:53 PM

Very good article grumpus, Already I have started reading it, I have couple of documents in my favourites, I hope this might interest you,

http://www.ftrain.co..._takes_all.html


Apart from Semantics, Stemming, Topic Sensitive PageRank, LocalRank.

I notice something very strange, Couple of different ranking algorithm for different queries, ,


This is distinctly visible in different SERPs, Did you notice it Grumpus, I never saw anyone discuss this in any forum, But I am definetely sure google now uses couple of different algorithms for different keyword phrases, Possibly Topic sensitive PageRank is attached to these different algorithms, I can remember some mod suggest this in some thread that Google is possibly using different algorithms for different phrases now I am seeing it visibly in different SERPs,
There is NO commerical filter happening in google but couple of different algorithms handling different queries, Possible the queries are categorized based on Spam complaints, motive of the google search user etc, This is definetely interesting,

VIJAY,

#4 awall19

awall19

    Peanut Butter Lover

  • Active Members
  • PipPipPipPipPip
  • 502 posts

Posted 18 January 2004 - 01:29 AM

Good stuff...Its reading stuff like that which makes me want to go to college and play with data mining more than trying to manipulate it on the other end.

#5 Grumpus

Grumpus

    HR 6

  • Active Members
  • PipPipPipPipPipPip
  • 786 posts

Posted 18 January 2004 - 09:27 AM

Before I discuss that excellent article, let's look at this, first:

Couple of different ranking algorithm for different queries


I've heard that. I've seen what appears to be that. I don't think that's exactly what's happening, though. As the article you link to states, Semantics requires a lot of confirming data before a safe assumption can be made. In the cases of some competitive areas the problem comes in because everyone is fighting for the same small set of words. These sites, in order to be competitive for their keywords, have lots of "pointers" (to use the article's name for them) but they don't point to anything - except each other.

Instead of:

(Scott Rahin) has a (Martin Guitar).
[Scott's] (Martin Guitar) is a model (245).


Many of these pages have things like:

buy [Scott Rahin's Martin guitar]. [Scott Rahin's Martin Guitar] is a Model 245. [Scott Rahin's Martin Guitar] rocks! You're stupid if you don't buy [Scott Rahin's Martin guitar]!

This is overly simplified, but you'll notice that in the original example, it identifies concepts through separation. "Scott" has "Guitar". "Guitar" is "model 245".

In the second example, Scott is never separated from his guitar, so it's "Scott Rahin's Martin Guitar" no matter how you cut it. You can't extrapolate any semantics from it because it's always used in the same way and as a complete phrase.

Now, if 5,000,000 other web pages (out of 9,000,000 or whatever) pages are all using the same keyphrase of "Scott Rahin's Martin Guitar" in exactly the same way, then the "pointer" becomes that phrase and it points to phrases like "rocks" and "you're stupid" and "buy". The only useful word in there is "buy" and even then, it's not pointed to by "Martin Guitar" it's pointed to by "Scott Rahin's Martin Guitar"

So, in your competitive markets where people vie for the same terms and everyone is optimizing pages for the "terms" and not the "concept" of a "martin guitar" owned by "Scott Rahin", the whole semantics thing falls apart.

In the end, if there's a sector like this (think "real estate" and "hotels" and "airfares" and other highly optimized and competitive areas) then the semantics have no hope of working so it either works poorly, or Google has something in there to kick that part of the algo out.

(I suspect that at the first stages of Florida, they allowed it to work poorly - hence the weird results, and later tweaked the levels of semantical requirements to trigger the "use it or not use it" levels - and those adjustments continue today).

G.

P.S. I need some coffee and to whip off a few e-mails, but I'll be back in a few to point out some excellent points and some slight inaccuracies (most likely they are in there to keep it simple, but...) in that article. It's a great read!

#6 Grumpus

Grumpus

    HR 6

  • Active Members
  • PipPipPipPipPipPip
  • 786 posts

Posted 18 January 2004 - 10:33 AM

Okay - let's go play around in that article.

The first section is a pretty good description of the basics of the semantic web. It's a bit simplified, but for those who don't have the time (though if you are an SEO, there's really no excuse for not having the time to understand that), it's an excellent "capsule" of it all.

After the first section, the article moves into "the future" and possible applications of this technology over time. While they may not be particularly accurate (or maybe they will prove to be pretty accurate) each section does go into fleshing out the concepts and seeing how they can be used.

There are a few things I can say definitely won't happen, though.

Amazon and Ebay - remember them? - doubtless saw the new product and realized they were in a bind. They would have to “cannibalize their own business” in order to go the Google path - give up their databases to the vagaries of the Web. So, in classic big-company style, they hedged their bets and did nothing.


Maybe that's true with E-bay, but not Amazon. Amazon, remember, scooped up Alexa a while back. Alexa isn't so much a "search" property, but rather, an "indexing engine" which is, as this article points out, the primary problem with the Semantic Web - you have no way of indexing all the documents.

There's no way to see what Amazon is doing with their data, but there are some new features back in the associates section that indicate that they are making great strides toward excellence in this field. They won't be search players, but they actually have the jump on Google (as far as starting to work with these concepts ahead of them). Google will likely still win because they are throwing more resources at the efforts, but... don't rule out Amazon.

What that really means is that RDF is data about web data - or metadata.


This article seems to focus on data already being formatted for Google to be able to use. It describes applications that will take your input and format it and send it out all over.

It is a fact that formatting data in a particular way helps Google considerably because it doesn't have to extrapolate it. Take Froogle, for example. You can send them a feed (yup, it's an RDF feed) of your products and it'll ensure accuracy and completeness for your listings. But, a huge bulk of the data in Froogle is crawled and extrapolated. Sure, there are some mistakes (one of our cre8asite moderators is listed as a purchasable product for a mere $99.95). Overall, though, Google is pretty darned good at identifying products within web pages that are set up properly.

This is why, over the past few weeks here and over the past year or so over at Cre8asite, I've been loudly advocating navigational, site and page structural, and overall consistency on your pages. Most of the time, when I talk about these things, the topic brings up no real discussion or turns to something rather unrelated. It's rather sad really, because all this stuff - even if it's not of critical importance now, will be important soon.

In a post about how How Consistent Page Structure Allows Google To Extract and Assume Specific Information, we can see how Google doesn't always need an RDF feed to make its Semantic extractions. It just needs to be set up in a way that it can identify what's what. In that example, you can also see that it doesn't necessarily have to be products for which it's extracting the information. And, even if you only have a few pages on your site, you can still be in good shape if there are enough people in your sector using the same "pointer words" and they have employed consistency factors (even if they aren't exactly consistent with your own layout). This post also talks about how URL structure (even though what you name the directories and pages isn't important) can help in this too.

This post about navigational structure helps you understand how you can use site structure to achieve the same types of results. The pyramid scheme (in this case, that's a good thing) allows your "pointers" to be pointing to concepts on the deeper pages and not on the same page as in the first example.

----

I should point out that these techniques now work very well on larger sites. This is true because, as we said, semantics needs a "pattern" in order to recognize the pointers and what they point to. Smaller sites may not have enough pages for a pattern to emerge - but that doesn't mean it's not worth your time to learn about and implement these concepts on your sites. The reason is that this whole thing works across multiple sites. Google can compare the information on your site against all the others in your market. If you all use the same range (or a similar range) of pointers, and have consistency for yourselves even if you don't match up layout-wise with the others, then pretty soon, Google has enough information to start applying semantics to your pages.

----

Before I go, let me know expand on my conclusion from my previous post:

if there's a sector like this (think "real estate" and "hotels" and "airfares" and other highly optimized and competitive areas) then the semantics have no hope of working so it either works poorly, or Google has something in there to kick that part of the algo out.


In the areas that were greatly affected by the Florida Update, was there some sort of recognizable pattern among the batch of sites that ended up at the top? They may not have been the most relevant pages, but they may very well have been the ones that were best in "semantical synch" with each other using identifiable pointers (not keyterms) with those common pointers pointing to something relevant to the term.

Obviously, in those competitive sectors, this would make a huge difference and the power of the semantics part of the algo was weighted too heavily - it brought out consistency on a page (or set of pages) rather than explicit relevancy. Since then, they've adjusted the weights to make it better, but they aren't going to go back to the way it was before. If you can't get your site back up in the ranks, I suggest it's not because of something wrong with Google, but rather that you're focusing too much on key phrases rather than establishing "pointers" and then pointing them at something.

As time goes by, the semantics portion of the algo is only going to increase in importance. It does work - if you are presenting the data to them in the right way. There will be a grace period (how long? I dunno) where you'll be able to rank well using the "keyword" model but over the next 12-24 months, expect that model to slowly and progressively become less effective as more and more SEOs embrace the new technology and those patterns and semantic extrapolations become more the norm in web design.

Many of you are going to read this and, as usual, say something to the effect of "Oh, that Grumpus is full of Posted Image." That's fine. I'm used to it. :rolleyes: I do have an excellent track record over the past couple of years of introducing new concepts that are on the horizon. This stuff is really no longer on the horizon anymore, though. Our toes are right up there on that line and if you aren't up to speed on this stuff, then it's definitely time to do so.

If this thread spawns off some good discussion on this topic, I'll be more than happy to share some insights into TSPR (Topic Sensitive PageRank) and LocalRank and some of the other elements that are relatively new and complicated. I'm not going to go into those now, though as it usually ends up being wasted time to introduce such things without a resulting dicussion about it. (Remember, I'm learning all of this stuff, too, so it's the discussion process that helps me figure it all out in my mind.) :D

Have fun - there's a lot of stuff to read and understand in the top several posts of this thread. Believe me: It's well worth your time if you plan on doing SEO in the coming months and years.

G.

#7 powerofeyes

powerofeyes

    HR 7

  • Active Members
  • PipPipPipPipPipPipPip
  • 1,123 posts
  • Location:INDIA

Posted 18 January 2004 - 01:10 PM

I've heard that. I've seen what appears to be that. I don't think that's exactly what's happening, though.

Ok, Let me give some examples of what I am seeing, what i am seeing is atleast 3 different ranking algorithms, one database of keywords and phrases goes through some heavy filtering(possible over spam filter or some sort of filter which judges the motive of the search user), an other set of keyword phrases going through just normally with repeated phrases and the third one which is the most common algorithm we are seeing which supports sprinkling and breaking down of keyword phrases, this one is the one which seems to be more on the semantics side,

Example 1. If you see these SERPs Volatile Shoes Or this one converse shoes, they go through a heavy filter, you can see all sorts of directories there, I give these shoe SERPs as examples because I have been working on couple of shoe sites before florida update and also after the florida update, For the same results before florida update the SERPs were full of affiliates, spam sites, redirect sites etc, Now all those are filtered out of the results, keeping these junk results, this is one example, It is very difficult to guess these SERPs what really google is doing, this is the worst filter that hit many sites,


Example 2. This is much more similiar to pre florida SERPs where lots of repeated keyword phrases over optimizing everything was allowed and many sites thrieved, Again I give the samples only on the sites which I worked or analyzed more, Virtually there is no change in the results before florida and after florida most of the site remained in the same place without any changes, See these SERPs, search engine optimization or this one rocket dog shoes or this one unlisted shoes

Example 3. This is the most common SERP we are seeing distribution and sprinking of keywords, breaking down and distribution of keyword phrases, lots of stemming results, possible semantics play etc, See these SERPs california health insurance or this one car shipping or this one time tracking software. These results are more relevant but still lots of sites are filtered from these SERPs,
All these 3 examples are completely different and I have worked all these examples I have given here and have observed a lot before florida update and after florida update,
These are possibly a result of different algorithms handling different keywords and phrases, Looks to be a very interesting thing, I have been analyzing these type of results for the past 11/2 month and I dont see any big changes, So I am definetely seeing different algorithms applied to different keyword phrases,


Now let me come to semantics,
That document I referred had some excellent speculations of what the future google can be and it also gives a good briefing about Semantics,

this is another good article,

http://www.scientifi...umber=1&catID=2

Some more information on semantic web and RDF

http://www.w3.org/2002/07/swint

http://www.w3.org/TR/rdf-mt/

http://reliant.tekno...F-IJCAI2001.pdf

ill post my ideas on semantics and google here some sleep,

VIJAY

#8 torka

torka

    Vintage Babe

  • Moderator
  • 4,392 posts
  • Location:Triangle area, NC, USA, Earth (usually)

Posted 18 January 2004 - 05:14 PM

If this thread spawns off some good discussion on this topic, I'll be more than happy to share some insights into TSPR (Topic Sensitive PageRank) and LocalRank and some of the other elements that are relatively new and complicated. I'm not going to go into those now, though as it usually ends up being wasted time to introduce such things without a resulting dicussion about it. (Remember, I'm learning all of this stuff, too, so it's the discussion process that helps me figure it all out in my mind.) :applause:

I don't know what exactly I could contribute to such a discussion other than asking the occasional newbie question, but I'd be interested in hearing your thoughts on these topics, G.

I feel as though I've just started to get a handle on the "old" SEO and everything's set to change. This isn't a complaint -- rather, I think if I can get a good grasp of what's coming down the pike NOW, I may find myself ahead of the game later on, while others who were content to keep thinking in the old paradigms struggle to catch up.

I'm eager to learn more. I've bookmarked both the articles previously referred to and will at least put a start on wading through them after my son goes to bed tonight. Do you have any further suggested reading, or some additional thoughts that might help me clarify my thinking on these topics?

--Torka :censored:

#9 Grumpus

Grumpus

    HR 6

  • Active Members
  • PipPipPipPipPipPip
  • 786 posts

Posted 18 January 2004 - 06:27 PM

Torka - start with the one Vijay posted as it's much better as an introduction. The one I posted will be a lot easier to digest once you have the basics of it down.

When you get through, feel free to post some questions on it (you'll have plenty, believe me) :censored: From there we can start exploring some other areas, but I don't want to move too fast - each area is fairly overwhelming. LocalRank and TSPR can be explained fairly simply - but it can be explained even more simply if there is some background in the semantics area first.

G.

#10 Steve Sardell

Steve Sardell

    HR 5

  • Active Members
  • PipPipPipPipPip
  • 331 posts
  • Location:Hilton Head Island, SC, USA

Posted 18 January 2004 - 07:14 PM

Hey Grumpus,

Glad you and Vijay are getting us into the deep territory. Like Torka, I am needing to do quite a bit of studying to begin grasping, but that is what it is all about. With many things changing, some radically, and the web becoming so competitve, it is my feeling the ones not willing to keep pace with the changing enviornment will be left in the dust. IMHO the basics may remain, but they will not be enough.

#11 Paul J

Paul J

    HR 4

  • Active Members
  • PipPipPipPip
  • 141 posts
  • Location:Minneapolis, MN

Posted 18 January 2004 - 09:01 PM

I'm with Torka and Steve.

I've only read through Vijay's first link. Lots of good info. Tomorrow I'll be pretty darn busy reading everything else, and I'll definitely be interested in hearing more about LocalRank.

#12 Scottie

Scottie

    Psycho Mom

  • Admin
  • 6,294 posts
  • Location:Columbia, SC

Posted 18 January 2004 - 11:10 PM

I've read most of the posted links (not finished with the original link from Grumpus yet! Vijay, those are very readable explanations overall :zz:)

Do we know what Kaltix has developed? The leap in all of these theories (IMO) is HOW the data gets into the ontologies for logic/comparison/establishing relationships. Is Google building RDFs for every posssible topic and running queries through them? Maybe they are, but that seems a little difficult.

And, how do they determine trustworthiness of the sources? Are we back to PR ratings? If enough people say Bill Gates is the devil, is he? :lol:

Here is where I see the difficulties in using semantics, and I'll use an example Grumpus posted a while back:

<Love> is <blind>.
<Ray Charles> is <blind>.
-therefore-
<Ray Charles> is <love>.

Heck, we've now got two pages on the web that appear to confirm that Ray Charles is love.

All very fascinating stuff, and I have no doubt we will be using our own agents shortly to trade real world information like appointments and item for sale, but I'm not yet seeing how this applies to searching, today. Useful applications for the semantic web still require published RDF data.

Even looking at AdSense results (where most people assume semantic logic is being used to deliver relevant ads) I see what still appears to be obvious simple syntax keyword matching for results- some are pretty silly. IMO, they would want to perfect semantic analysis via AdSense before rolling it out to mainstream search results...

What is missing is the AI (artificial intelligence) that builds the relationships and makes inferences that Paul and Jim are friends because Paul said it on his web site, even though Jim doesn't mention it on his. The "flexibility" of semantic search is the ability to have these inferences created without human intervention or "rules" manually created. Has Google figured that out? IMO, not in the search results at this time.

#13 Scottie

Scottie

    Psycho Mom

  • Admin
  • 6,294 posts
  • Location:Columbia, SC

Posted 18 January 2004 - 11:28 PM

Paul, I'm looking for the LocalRank patent so you can read for yourself, (the patent office site appears to be down right now) but here's my take on the patent:
  • Standard PR is used to pull the top 1000 (or whatever number they choose) results.
  • Those results are examined for duplicate information and similar hosts. Dups and pages on the same IP block are tossed, keeping only the highest ranking page that is unique and relates to the query.
  • Then the results are resorted according to how they link to each other. (That's the local part- how many pages within that SERP think your page is important?)
  • The original rank and new rank are somehow combined to create a new score (this helps in the event none of the results link to each other.)
  • Final results are delivered to the user.


#14 Steve Sardell

Steve Sardell

    HR 5

  • Active Members
  • PipPipPipPipPip
  • 331 posts
  • Location:Hilton Head Island, SC, USA

Posted 18 January 2004 - 11:34 PM

Do we know what Kaltix has developed? The leap in all of these theories (IMO) is HOW the data gets into the ontologies for logic/comparison/establishing relationships.

Hi Scottie,

In a nut shall NO. Since taken over by G they have been pretty mum regarding how personalization will be accomplished. The basis is search history. Below is an old news article. It merely states the basics. But, for those who do not know about Kaltix it is a good starting point. I have a white paper some place, and will try and dig it out.

http://news.com.com/..._3-5061873.html

#15 Scottie

Scottie

    Psycho Mom

  • Admin
  • 6,294 posts
  • Location:Columbia, SC

Posted 18 January 2004 - 11:37 PM

OK- I had that wrong. Kaltix is about personalization and local computation, not semantic analysis, according to that article. Thanks Steve! :lol:

Added- Grumpus, excellent article and very readable. Worth the time investment, thanks for pushing us to read it! I think it is answering my question posed above- about the AI factor... and very good info on stemming too!




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users