Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!



Photo
- - - - -

Session Id


This topic has been archived. This means that you cannot reply to this topic.
9 replies to this topic

#1 RayBat

RayBat

    HR 1

  • Active Members
  • Pip
  • 4 posts

Posted 31 July 2003 - 06:17 AM

Hi everybody,

after understanding how to deal with dynamic URLS ;), I'd like to find out how to manage session ids. Most of the time I can read that session ids are just a "no go" for SEO and that URLs with session ids will not be crawled. (e.g. Shari Thurow, Search Engine Visibility, page 159: URLs that contain session IDs are often the kiss of death in the search engines).
I'm not a programmer (forgive my incorrect wording in the following statements :lol: ), but as far as I know there are three basic ways to implement a session id:

1) using a cookie
2) using a get parameter in the URL name
3) using a programm that puts a long number of signs in your URL as session id

First question: is that right? Are there more relevant options out there?

Cookies seem to be a nice thing, because crawler cant take cookies (do they?) and so there is no problem with SEO. But users can turn cookies of, so nobody who wants to do some "serious" user tracking uses cookies - right?

The other two options seem to have the problem of having a not-crawlable URL. Is there any way to deal with session ids?

I know that there is a script out there identifing robots and giving them URLs without session ids. This is not what I'm looking for because for this script you need the names of all crawlers. The search engines would not be very clever to use only one known standard name for their crawlers all the time, because this would open a huge spamming door...

Thanks, I hope this is a interessting one :lmao:
RayBat

#2 Jill

Jill

    Recovering SEO

  • Admin
  • 33,244 posts

Posted 31 July 2003 - 08:11 AM

Welcome, Raybat! :lmao:

I'm sure one of our programmers will chime in to help you. As far as I know, and according to Google, the page extension doesn't matter at all. They will index any page extension, even made up ones.

Jill

#3 Alan Perkins

Alan Perkins

    Token male admin

  • Admin
  • 1,648 posts

Posted 31 July 2003 - 08:19 AM

The short answer is in the name itself: session id. The concept of a "session" is lost on spiders. Spiders don't visit in a session, so they should not have a session id.

Whichever way you implement sessions, you need to ensure that visitors aren't forced to use them.

#4 RayBat

RayBat

    HR 1

  • Active Members
  • Pip
  • 4 posts

Posted 31 July 2003 - 11:13 AM

Thank you for the quick answer!

@Jill: Hey, the next post will be in the pub again, just to confuse you :) .

Google doesn't seem to like session ids. In their (webmaster section) they say:

"Allow search bots to crawl your sites without session ID's or arguments that track their path through the site. These techniques are useful for tracking individual user behavior, but the access pattern of bots is entirely different. Using these techniques may result in incomplete indexing of your site, as bots may not be able to eliminate URLs that look different but actually point to the same page."

Is there any source that can tell how far Google's capability of indexing URLs including session ids works? Or is Google able to skip the session id part? Or is it just black and white: Session id in URL = no crawl?

@Alan: I'm really happy to talk to you! I'm trying to understand all alternatives for the use of session ids (including advantages/disadvantages). I found some web pages telling me that there are some "innovative" ways of dealing with sessions ids in URLs (like here: www.search-optimisation.com), but I cant find detailled explanations. The company I'm working for is already using session ids in URLs and I'd like to find out if they have to change their user tracking system or if there is a SEO friendly workaround for the session id problem. Google says: "Allow search bots to crawl your sites without session ID's ....". How can I do that without giving up on usertraking and without using cookies? *hope to lure you in the long version of the answer* :)

THANKS, this forum is really great,
RayBat

#5 Ron Carnell

Ron Carnell

    HR 6

  • Moderator
  • 968 posts

Posted 31 July 2003 - 11:22 AM

Session IDs track, well, sessions.

The web is a stateless environment, meaning that when you request a second page from a web site, the server doesn't really know you from Adam. Even though you were just there. This is a problem if you want to maintain shopping cart information or a user logon throughout the whole site. Cookies are one way to track a visitor as they move through your site, but obviously present a problem when the visitor has cookies turned off. A Session ID is little more than a unique string that is dynamically embedded in every single URL the visitor gets to see. Any page request containing that unique string MUST have been requested by the same person. End of problem?

Maybe not. Go to the same web page on two different days, and you'll likely get two different URLs, with the difference obviously being the Session ID. What happens if you bookmark one of those URLs? The software should be smart enough to realize, when you return, that the session has expired and will quickly issue you a new Session ID. Every link on that page will now include the new Session ID instead of the old one. This represents a serious problem for search engines. Every time they visit your site, they get new URLs for the same pages. To them, it looks like you have an infinite number of web pages in your site.

Google has said they don't like to index dynamic pages that include an ID parameter. There is some pretty convincing evidence to suggest they will also avoid any page with a parameter that even LOOKS like a Session ID. Since about 99 percent of the Session IDs being used on the web are generated by PHP or ASP, Google's programmers have a pretty good idea what to avoid. Any long string of numbers, I think, is suspect and should probably be avoided.

So, what's the answer? Use cookies and you lose some visitors because your shopping cart won't work for them. Use embedded Session IDs and the spiders won't crawl your pages. Avoid both and your dynamic site becomes useless, because we MUST be able to somehow identify a visitor as they pass through the site.

RayBat, the script you heard about is probably the only viable solution to the problem. Yes, it needs to know the names for all the crawlers, but that isn't a problem. Those names are pretty standardized and change infrequently. Were that not true, your robots.txt file would be useless, too.

What you're essentially talking about is agent-based cloaking, and that solves one problem at the potential expense of creating a new one. The script receives a request for a page and looks at the user-agent field. If the user-agent is IE or Netscape or Opera, the program creates a dynamic page with a Session ID embedded in every clickable link. If the user-agent is Googlebot or Slurp, it creates the page without a Session ID in the links. Pretty simple, so far.

The problem is that this same technology (and it's big brother, IP-based cloaking) can also be used to artificially manipulate search engine ranking. Say, for example, that your script did a little more than just stripped the Session ID from all the links? Say, for example, that it also randomly threw about 500 very targeted keywords into the page content? A visitor coming to the page with IE would never see those keywords. But the spider would. That obviously creates a bit of a problem for the search engine, so most of them, including Google, have expressly said, "To preserve the accuracy and quality of our search results, Google may permanently ban from our index any sites or site authors that engage in cloaking to distort their search rankings."

The key phrase there, I think, is "to distort their search rankings." I have never heard of a site being penalized for removing a Session ID. As long as the page you deliver to the visitors is essentially the same as the page the spider sees, there really shouldn't be a problem. You're not trying to manipulate the SERPs, you're just trying to get in them!

But knowledge is power, and anyone considering the using of cloaking technology, no matter how innocuous, should be aware of the whole picture. If the dark side of cloaking becomes a bad enough problem that Google eventually automates detection of it, there is at least some possibly that innocent cloaking will be caught in the cross-fire. Personally, I don't think that's likely. But you gotta make your own call on that kind of stuff. :)

#6 Mel

Mel

    HR 5

  • Active Members
  • PipPipPipPipPip
  • 353 posts

Posted 31 July 2003 - 11:35 AM

Very nice answer Ron.

There are some other valid used for what I like to call "ethical cloaking".

Some sites have content which is only viewable by entering a username and password and obviously the spiders can't do that, but if you have a script to force browser inquires to go though the username and password routine,but allow named spiders to go direct to the page, you are then getting more of your site indexed, and using cloaking to insure that the users and spiders see the same content.

#7 RayBat

RayBat

    HR 1

  • Active Members
  • Pip
  • 4 posts

Posted 31 July 2003 - 12:20 PM

Great answer Ron, this was what I'm looking for! :)
And thanks for the good hint Mel! :)

New questions will be coming up soon!

Thanks, /bow
RayBat

#8 Jill

Jill

    Recovering SEO

  • Admin
  • 33,244 posts

Posted 31 July 2003 - 01:31 PM

There are some other valid used for what I like to call "ethical cloaking".


For the record, I wouldn't call it ethical cloaking. It's not cloaking at all.

You're showing the users and spiders the same thing. No cloaking. I forget exactly what it is called though. My mind seems to be blank today! Maybe I should have slept some time this week. ;)

Jill

#9 robwatts

robwatts

    HR 5

  • Active Members
  • PipPipPipPipPip
  • 308 posts

Posted 03 August 2003 - 01:41 AM

For the record, I wouldn't call it ethical cloaking. It's not cloaking at all.


The technical term is User Agent delivery.

I hear what Mel is saying though. :)

You are serving up two different scenarios to the spider, and are ( in the scenario Mel painted) hiding the lines of code that create the sess id.

I don't want to turn this into a definition of cloaking thread, so I will now shut up. :)

Edited by robwatts, 03 August 2003 - 01:49 AM.


#10 Alan Perkins

Alan Perkins

    Token male admin

  • Admin
  • 1,648 posts

Posted 04 August 2003 - 04:09 AM

I've broken out the discussion on Indexing Password-Protected Content.




We are now a read-only forum.
 
No new posts or registrations allowed.