QUOTE(Michael Martinez @ Oct 20 2005, 01:28 PM)
A few of the sites have hidden INPUT elements on their pages. In the past, this has not been an issue because hidden INPUTs are used to transfer data from HTML pages to CGI scripts. I don't believe this SHOULD be an issue, but I'll dicuss this in my HYPOTHESES section.
And I did NOT do that because I ran out of time and only just now remembered I wanted to say something.
There should be a number 4 speculation that reads:
"Google may be filtering on certain elements that include 'hidden' in their syntax" and the current number 4 "Google may have seriously broken its ranking software" would be number 5.
Remember, number 1 is the hypothesis I feel is the strongest, best case for fitting the facts, and number 5 is the one I consider to be least likely.
I only bring it up because of something I read on Matt Cutts' blog. I don't believe he reveals all the factors that go into detecting the kinds of sites that he features as having been penalized by Google, but he did specifically refer to the "hidden" syntax.
I don't recommend people use this kind of syntax if they can avoid it, but a number of Webmasters have objected to the apparent filtration of intentionally hidden elements. Matt has offered some clarification, indicating that apparent intent is being taken into consideration.
Can apparent intent be identified (or mis-identified) algorithmically? I believe they could take a reasonable stab at it. They would have to program a lot of assumptions into their software.
But Google did pay a group of people (and may, for all I know, still be paying them) to manually evaluate suspicious Web sites. I forget what the program was, but it was left exposed to outside scrutiny for a while and people even circulated the guidelines provided to the Web site "auditors" (if I may call them that -- I don't know what Google actually called them).
I am very confident that the feedback provided by those people could have been boiled down to a set of programmable rules which, yes, could probably implement a seriously effective filtration system.
In effect, Google may have begun implementing a Bayesian
SE spam filtering system. Anyone who has worked with such spam filters for their email, which attempt to learn about what is acceptable and what is not, should be able to see what I am referring to.
I would expect poor results from a Bayesian filter at first, but to see considerable improvement over time. On the other hand, practical experience in the real world has led many people to conclude that Bayesian filters are not all they are cracked up to be. There is a limit to the effectiveness of the technology, and it requires considerable human intervention (someone has to identify what is and is not spam).
The company I work for uses a Bayesian filter for its email and I am constantly whitelisting the same sources of email because they subtly alter their email structures (it doesn't help that they sometimes send me 2-3 emails a day).
I don't think, based on what I have read and seen so far, that Google has implemented such a filter. But I have long suspected they may be developing one, and I decided I'd better toss it into the mix of hyperspeculation.