Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!


Sponsored Content

 

 
 

Photo
- - - - -

Url Removal Problem


  • Please log in to reply
26 replies to this topic

#16 piskie

piskie

    HR 7

  • Active Members
  • PipPipPipPipPipPipPip
  • 1,092 posts
  • Location:Cornwall

Posted 06 September 2008 - 02:17 AM

Have I been wrong all these years thinking that Bots just don't fill forms in and then press a button ???????

#17 chrishirst

chrishirst

    A not so moderate moderator.

  • Moderator
  • 5,881 posts
  • Location:Blackpool UK

Posted 06 September 2008 - 02:42 AM

Yep. You've been wrong smile.gif

bots can submit forms of course, by having the fields pre-populated then sending a HTTP POST request to the URL defined in the action attribute of the form.

That's why CAPTCHAs can stop auto submits.

Having a hidden checkbox will cut down on the spam submissions you have to look at. Real users can't see it to check it. So, if it's checked ....
.... It's binned.

Randomly changing the name of the submit button and checking a session value for a match really screws them up though.

#18 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 06 September 2008 - 06:56 AM

True Chris, but you're not dreaming either Piskie. Until earlier this year the search engine bots didn't submit search forms. I've not researched it with the other engines, but to my knowledge only Google does it now. Mainly because of their push to discover more of the Invisible Web.

Perhaps I'm crazy, but to me it would have made a lot of sense for the Google crawl team to have thought ahead a bit and implemented some easy way for Webmasters to opt out of such a radical change in the way their bot did business. They'll probably say they did provide an opt out since you can use the POST method to opt out, but as shown by their own Chrome browser this always an ideal solution. Its back button doesn't work well with POST forms, but seems quite happy to work without issues for GET forms.

IMHO it would be far more sensible to have a simple instruction one could place in robots.txt, meta robots tag or even a new attribute of the <form> tag that would tell the bots to leave the form alone. No matter what submit method the form used. One shouldn't have to be forced to entirely exclude any page a form appears on to keep Googlebot from submitting the form.

As things stand at this moment I wouldn't consider what they're doing with forms when crawling to be anywhere near being friendly, no matter what they may state in the official Webmasters blog. Which is silly considering how easy it would be for them to implement something webmasters could utilize to tell Googlebot to leave the forms, and only the forms, alone.

#19 Jill

Jill

    High Rankings Advisor

  • Admin
  • 32,317 posts

Posted 06 September 2008 - 08:08 AM

It's really ridiculous the way they submit forms. They seem to use random words from the pages or from god knows where and it creates a huge mess.

But it's their mess. They're not gonna penalize people cuz they make a mess. I refuse to believe they would do that.

#20 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 06 September 2008 - 11:37 AM

Unless the spam team doesn't truly understand the mess the crawl team has made of it with this particular bugaboo.

That's my fear Jill. That the spam team would suddenly see all of these crap pages on a site being indexed and not understand the source of those crap pages getting indexed is Googlebot and it's form submission routine itself.

Not that I think we've seen this happen yet. Mostly because from the beginning they started out testing the Formbot on highly respected, usually larger sites. So it would take a good while for the Formbot to "create" enough junk pages to have any negative effect.

Now if they did this on a site that previously had 50 pages, and all the sudden the spam team saw a jump up to 20,000 pages with 19.950 of them being complete junk that was produced by Google's Formbot, I can see how the spam team or spam algo might slap the site down. Without anyone even realizing that the whole apparent spam problem was created by Google themsevles!

This is one reason why Google absolutely need to give webmasters a way to opt out.

Not to mention server load factors that some site searches can cause if over utilized. Such as an x-Cart site that has several thousand items with each having a couple dozen attributes. The native site search functionality built into x-Cart can bring a server to its knees in some cases.

Of course I don't expect Google to actually do anything about it until someone's site really gets hurt and the cause can be tracked back to Google's Formbot adding thousands and thousands of junk pages. And then only because the someone threatens to sue them for penalizing a site for something Googlebot itself did. lol.gif

Talk about a Catch-22!

#21 piskie

piskie

    HR 7

  • Active Members
  • PipPipPipPipPipPipPip
  • 1,092 posts
  • Location:Cornwall

Posted 06 September 2008 - 05:19 PM

Randy, you used the term "formbot" is that a different Spider that could be excluded independently from normal Googlebot ?

#22 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 06 September 2008 - 09:01 PM

Not from the logs I've seen piskie. The bot that submits forms shows up as simply Googlebot.

I just refer to it as Formbot in hopes that maybe Google might get a clue and give people a way to block it. Like give the censored.gif thing a different name than the normal Googlebot or some other exclusion method that will only affect the form submitting bot.

#23 Dantek

Dantek

    HR 2

  • Members
  • PipPip
  • 14 posts

Posted 07 September 2008 - 01:57 AM

QUOTE(Jill @ Sep 6 2008, 09:08 AM) View Post
But it's their mess. They're not gonna penalize people cuz they make a mess. I refuse to believe they would do that.

That's exactly what I said to myself when I first discovered the pages in the index. It just wouldn't make sense to negatively affect a site by their own experiment or new crawl methods. The problem with this initial assumption is that it's not humans making that choice, it's just a calculator built by humans.

QUOTE(Randy @ Sep 6 2008, 12:37 PM) View Post
I can see how the spam team or spam algo might slap the site down. Without anyone even realizing that the whole apparent spam problem was created by Google themsevles!

This is kind of my point on why I now think it may be hurting the site (not only because I'm down a spot). My point is more like "algo might slap the site down without realizing that the new bad pages are because the new baby crawler is being very very bad".

Maybe today the algo says "if 20%=good AND 80%=supplemental THEN minus 1 point". Maybe tomorrow the algo will say ""if 20%=good AND 80%=supplemental THEN minus 1 point EXCEPT if all the supplemental=formbot results THEN no change".

Has tomorrow happened yet? Will tomorrow only come when this happens, as Randy put it: "Of course I don't expect Google to actually do anything about it until someone's site really gets hurt and the cause can be tracked back to Google's Formbot adding thousands and thousands of junk pages. And then only because the someone threatens to sue them for penalizing a site for something Googlebot itself did." (BTW: I work for a law firm oldfogey.gif ............. lmao.gif )

Of course this is all speculation, but that's my worry (or should I say that's my hard head). smartass.gif


#24 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 07 September 2008 - 08:59 AM

Whether your site is being nicked because of the crawler activity is a sidebar issue really Dantek. Though I imagine it doesn't feel like a sidebar issue when you're smack in the middle of it yourself. Knock wood. One would hope and expect it isn't having any negative effect, but who can say with 100% certainty. I doubt Google's engineers can even do this considering the number of factors that go into why a site ranks where it ranks.

But I sure do wish they'd at least recognize and admit that their new form crawling <cough>feature</cough> at least introduces the possibility of them dynamically creating and indexing a lot of pages with questionable value.

Only when they recognize this will they consider devoting some time and brain power to giving webmasters a way to manually opt out of the program, or make their crawler smarter by building an internal function to automatically not index these pages of questionable virtue they've managed to create all by themselves.

#25 don h

don h

    HR 4

  • Active Members
  • PipPipPipPip
  • 188 posts

Posted 10 September 2008 - 04:46 AM


Google is typically doing this mostly for sites it thinks is large?

I have a meta nofollow, index directive on my search results pages. I've seen Google take random words out of the content and try to run a search for them. I believe I've only seen it search using one word so far and not phrases.

It shouldn't be that difficult for Webmasters to add a few lines of code to insert a noindex,nofollow directive when displaying search results?



#26 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 10 September 2008 - 06:43 AM

QUOTE
It shouldn't be that difficult for Webmasters to add a few lines of code to insert a noindex,nofollow directive when displaying search results?


True Don. Pretty simple really from the coding perspective. You could even set the code logic up so that it only produces the meta robots tag when the search produces no results, so that useful pages do get indexed. That's the approach I intend to take with my site search pages.

FTR, I'm going to use noindex, follow instead of noindex, nofollow. I'm not sure it's going to make any difference at all, in fact I doubt it will make a noticeable difference. But it doesn't bother me if the spider follow links on the search results pages since that's just going to include my normal navigation for the most part.

#27 Jill

Jill

    High Rankings Advisor

  • Admin
  • 32,317 posts

Posted 10 September 2008 - 11:32 AM

They don't have to be large sites. They were doing it on a small wordpress blog of mine. I think they love to do it with WP sites.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users