Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!



Photo

How Many People Here Filter Bots?


  • Please log in to reply
5 replies to this topic

#1 lister

lister

    HR 5

  • Active Members
  • PipPipPipPipPip
  • 415 posts

Posted 31 January 2014 - 11:57 AM

The answer should I guess be that we should all be filtering bots from hiting our server - right?

I mean there are obvious bots like googlebots etc - and servers like uptime monitoring....

 

I guess my question is - how do you do it effectively and how many of you guys do it?



#2 Jill

Jill

    Recovering SEO

  • Admin
  • 32,916 posts

Posted 31 January 2014 - 02:26 PM

You can't. They don't obey.



#3 Michael Martinez

Michael Martinez

    HR 10

  • Active Members
  • PipPipPipPipPipPipPipPipPipPip
  • 5,064 posts
  • Location:Georgia

Posted 31 January 2014 - 08:11 PM

I block Web hosting provider IP addresses in my firewall but only for the more aggressive, malicious bots and botnets. However, considering that a lot of them run on Amazon Web servers, I am sure I am getting a lot of the crawlers that execute Javascript.

Google Analytics is notorious for allowing Javascript-executing bots to screw up your traffic data but if you scrutinize your raw server logs for self-identifying user agents (make sure they fetch Javascript files from your site, if you serve any) then you can filter their IP addresses in Google Analytics' admin section (but setting up the filters is tedious).

You can create a trap for Javascript-executing bots by embedding a simple Javascript on your Website that displays a short phrase (it can be anything but should be sensible because Google and Bing both attempt to execute the Javascript). Then just extract those hits from your raw server log files.

At this point you sort the data by IP address and only look at the most active IP addresses over a 5-, 10-, or 30-day period. That should show you which bots are interested in Javascript. The ones most likely to execute the analytics code on your server will probably be in that group.

Yahoo!'s SLURP still runs in the wild. I have no idea of why they continue to crawl the Web. I have been blocking SLURP ever since Yahoo! shut down its search engineering team a few years ago. Maybe someone forgot to turn out the lights. But there have been a few notable occasions where Yahoo! has created massive data distortion in Google Analytics reports because GA mistook it for something else (one time for a mobile user from Ohio, I think).

#4 ttw

ttw

    HR 5

  • Active Members
  • PipPipPipPipPip
  • 372 posts
  • Location:San Mateo, California

Posted 13 February 2014 - 05:47 PM

We just went through an exercise with our Google Analytics guy to filter a lot of the bot traffic that is executing java-script.  We were seeing a 70% bounce rate for Direct Traffic and when we explored it further found that we had all this 100% bounce traffic.   

 

Our GA guy told us it was better to filter this out through GA instead at the server level - even though this traffic makes the server work harder.

 

We're waiting for enough data to see how the Bounce Rate changes.



#5 chrishirst

chrishirst

    A not so moderate moderator.

  • Moderator
  • 6,788 posts
  • Location:Blackpool UK

Posted 13 February 2014 - 06:40 PM

'bounce' rate  is only meaningful when a single source of traffic is analysed or specific advertising campaign "key" words are analysed.

 

'Direct' traffic 'bounces' for many reasons, most of which have nothing at all to do with your site pages.or the content.



#6 Michael Martinez

Michael Martinez

    HR 10

  • Active Members
  • PipPipPipPipPipPipPipPipPipPip
  • 5,064 posts
  • Location:Georgia

Posted 14 February 2014 - 11:45 AM

There is no advantage to filtering in Google Analytics; in fact, that's a disadvantage because when you allow the bots to crawl your server you leave yourself open to DDoS attacks, which are now widespread and quite devastating.

You can look at the recent history of DDoS attacks across the Web here:

http://www.digitalattackmap.com/

Some sites seem to be targeted more often than others. But the bots and the attacks have many different purposes. Most of this malicious activity is just automated stuff coming out of botnets that are trying to expand.

Most of the bots I see attacking my servers are coming from Web hosting companies. A lot of blogs using older software have been compromised and turned into spam/hacking proxies.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

SPAM FREE FORUM!
 
If you are just registering to spam,
don't bother. You will be wasting your
time as your spam will never see the
light of day!