Session IDs track, well, sessions.
The web is a stateless environment, meaning that when you request a second page from a web site, the server doesn't really know you from Adam. Even though you were just there. This is a problem if you want to maintain shopping cart information or a user logon throughout the whole site. Cookies are one way to track a visitor as they move through your site, but obviously present a problem when the visitor has cookies turned off. A Session ID is little more than a unique string that is dynamically embedded in every single URL the visitor gets to see. Any page request containing that unique string MUST have been requested by the same person. End of problem?
Maybe not. Go to the same web page on two different days, and you'll likely get two different URLs, with the difference obviously being the Session ID. What happens if you bookmark one of those URLs? The software should be smart enough to realize, when you return, that the session has expired and will quickly issue you a new Session ID. Every link on that page will now include the new Session ID instead of the old one. This represents a
serious problem for search engines. Every time they visit your site, they get new URLs for the same pages. To them, it looks like you have an infinite number of web pages in your site.
Google has said they don't like to index dynamic pages that include an ID parameter. There is some pretty convincing evidence to suggest they will also avoid any page with a parameter that even LOOKS like a Session ID. Since about 99 percent of the Session IDs being used on the web are generated by PHP or ASP, Google's programmers have a pretty good idea what to avoid. Any long string of numbers, I think, is suspect and should probably be avoided.
So, what's the answer? Use cookies and you lose some visitors because your shopping cart won't work for them. Use embedded Session IDs and the spiders won't crawl your pages. Avoid both and your dynamic site becomes useless, because we MUST be able to somehow identify a visitor as they pass through the site.
RayBat, the script you heard about is probably the only viable solution to the problem. Yes, it needs to know the names for all the crawlers, but that isn't a problem. Those names are pretty standardized and change infrequently. Were that not true, your robots.txt file would be useless, too.
What you're essentially talking about is agent-based cloaking, and that solves one problem at the potential expense of creating a new one. The script receives a request for a page and looks at the user-agent field. If the user-agent is IE or Netscape or Opera, the program creates a dynamic page with a Session ID embedded in every clickable link. If the user-agent is Googlebot or Slurp, it creates the page without a Session ID in the links. Pretty simple, so far.
The problem is that this same technology (and it's big brother, IP-based cloaking) can also be used to artificially manipulate search engine ranking. Say, for example, that your script did a little more than just stripped the Session ID from all the links? Say, for example, that it also randomly threw about 500 very targeted keywords into the page content? A visitor coming to the page with IE would never see those keywords. But the spider would. That obviously creates a bit of a problem for the search engine, so most of them, including Google, have expressly said, "To preserve the accuracy and quality of our search results, Google may permanently ban from our index any sites or site authors that engage in cloaking to distort their search rankings."
The key phrase there, I think, is "to distort their search rankings." I have never heard of a site being penalized for removing a Session ID. As long as the page you deliver to the visitors is essentially the same as the page the spider sees, there really shouldn't be a problem. You're not trying to manipulate the SERPs, you're just trying to get in them!
But knowledge is power, and anyone considering the using of cloaking technology, no matter how innocuous, should be aware of the whole picture. If the dark side of cloaking becomes a bad enough problem that Google eventually automates detection of it, there is at least some possibly that innocent cloaking will be caught in the cross-fire. Personally, I don't think that's likely. But you gotta make your own call on that kind of stuff.