Are you a Google Analytics enthusiast?
More SEO Content
Url Removal Problem
Posted 06 September 2008 - 02:17 AM
Posted 06 September 2008 - 02:42 AM
bots can submit forms of course, by having the fields pre-populated then sending a HTTP POST request to the URL defined in the action attribute of the form.
That's why CAPTCHAs can stop auto submits.
Having a hidden checkbox will cut down on the spam submissions you have to look at. Real users can't see it to check it. So, if it's checked ....
.... It's binned.
Randomly changing the name of the submit button and checking a session value for a match really screws them up though.
Posted 06 September 2008 - 06:56 AM
Perhaps I'm crazy, but to me it would have made a lot of sense for the Google crawl team to have thought ahead a bit and implemented some easy way for Webmasters to opt out of such a radical change in the way their bot did business. They'll probably say they did provide an opt out since you can use the POST method to opt out, but as shown by their own Chrome browser this always an ideal solution. Its back button doesn't work well with POST forms, but seems quite happy to work without issues for GET forms.
IMHO it would be far more sensible to have a simple instruction one could place in robots.txt, meta robots tag or even a new attribute of the <form> tag that would tell the bots to leave the form alone. No matter what submit method the form used. One shouldn't have to be forced to entirely exclude any page a form appears on to keep Googlebot from submitting the form.
As things stand at this moment I wouldn't consider what they're doing with forms when crawling to be anywhere near being friendly, no matter what they may state in the official Webmasters blog. Which is silly considering how easy it would be for them to implement something webmasters could utilize to tell Googlebot to leave the forms, and only the forms, alone.
Posted 06 September 2008 - 08:08 AM
But it's their mess. They're not gonna penalize people cuz they make a mess. I refuse to believe they would do that.
Posted 06 September 2008 - 11:37 AM
That's my fear Jill. That the spam team would suddenly see all of these crap pages on a site being indexed and not understand the source of those crap pages getting indexed is Googlebot and it's form submission routine itself.
Not that I think we've seen this happen yet. Mostly because from the beginning they started out testing the Formbot on highly respected, usually larger sites. So it would take a good while for the Formbot to "create" enough junk pages to have any negative effect.
Now if they did this on a site that previously had 50 pages, and all the sudden the spam team saw a jump up to 20,000 pages with 19.950 of them being complete junk that was produced by Google's Formbot, I can see how the spam team or spam algo might slap the site down. Without anyone even realizing that the whole apparent spam problem was created by Google themsevles!
This is one reason why Google absolutely need to give webmasters a way to opt out.
Not to mention server load factors that some site searches can cause if over utilized. Such as an x-Cart site that has several thousand items with each having a couple dozen attributes. The native site search functionality built into x-Cart can bring a server to its knees in some cases.
Of course I don't expect Google to actually do anything about it until someone's site really gets hurt and the cause can be tracked back to Google's Formbot adding thousands and thousands of junk pages. And then only because the someone threatens to sue them for penalizing a site for something Googlebot itself did.
Talk about a Catch-22!
Posted 06 September 2008 - 05:19 PM
Posted 06 September 2008 - 09:01 PM
I just refer to it as Formbot in hopes that maybe Google might get a clue and give people a way to block it. Like give the thing a different name than the normal Googlebot or some other exclusion method that will only affect the form submitting bot.
Posted 07 September 2008 - 01:57 AM
That's exactly what I said to myself when I first discovered the pages in the index. It just wouldn't make sense to negatively affect a site by their own experiment or new crawl methods. The problem with this initial assumption is that it's not humans making that choice, it's just a calculator built by humans.
This is kind of my point on why I now think it may be hurting the site (not only because I'm down a spot). My point is more like "algo might slap the site down without realizing that the new bad pages are because the new baby crawler is being very very bad".
Maybe today the algo says "if 20%=good AND 80%=supplemental THEN minus 1 point". Maybe tomorrow the algo will say ""if 20%=good AND 80%=supplemental THEN minus 1 point EXCEPT if all the supplemental=formbot results THEN no change".
Has tomorrow happened yet? Will tomorrow only come when this happens, as Randy put it: "Of course I don't expect Google to actually do anything about it until someone's site really gets hurt and the cause can be tracked back to Google's Formbot adding thousands and thousands of junk pages. And then only because the someone threatens to sue them for penalizing a site for something Googlebot itself did." (BTW: I work for a law firm ............. )
Of course this is all speculation, but that's my worry (or should I say that's my hard head).
Posted 07 September 2008 - 08:59 AM
But I sure do wish they'd at least recognize and admit that their new form crawling <cough>feature</cough> at least introduces the possibility of them dynamically creating and indexing a lot of pages with questionable value.
Only when they recognize this will they consider devoting some time and brain power to giving webmasters a way to manually opt out of the program, or make their crawler smarter by building an internal function to automatically not index these pages of questionable virtue they've managed to create all by themselves.
Posted 10 September 2008 - 04:46 AM
Google is typically doing this mostly for sites it thinks is large?
I have a meta nofollow, index directive on my search results pages. I've seen Google take random words out of the content and try to run a search for them. I believe I've only seen it search using one word so far and not phrases.
It shouldn't be that difficult for Webmasters to add a few lines of code to insert a noindex,nofollow directive when displaying search results?
Posted 10 September 2008 - 06:43 AM
True Don. Pretty simple really from the coding perspective. You could even set the code logic up so that it only produces the meta robots tag when the search produces no results, so that useful pages do get indexed. That's the approach I intend to take with my site search pages.
FTR, I'm going to use noindex, follow instead of noindex, nofollow. I'm not sure it's going to make any difference at all, in fact I doubt it will make a noticeable difference. But it doesn't bother me if the spider follow links on the search results pages since that's just going to include my normal navigation for the most part.
Posted 10 September 2008 - 11:32 AM
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users