Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



SEO Class in Chicago, IL

Learn How To Optimize Your Website on July 26, 2013


Looking for personalized in-depth SEO training among your peers?



High Rankings is offering a 1-day customized SEO training class in Chicago. Class size is limited so please sign-up now if you want in!



 


Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!



Photo

Tracking Spiders


  • Please log in to reply
21 replies to this topic

#1 Vertster

Vertster

    Google wristbar installed

  • Active Members
  • PipPipPipPipPip
  • 327 posts
  • Location:Salt Lake City, Utah

Posted 29 September 2003 - 02:39 PM

Curious what software and techniques everyone is using to track spiders hitting their site?

I have been demoing the "robots.txt" editor/log analyzer. It is great, but I am having troubles getting it to read my IIS log files. I quickly parses through large logs and pulls out just the spider hits and shows which spider hit which files on which day. Unfortunately, I am thinking it might have some bugs, because it will not let me edit the W3C Extended format.

I also use an older log analyzer called Surfstats. It shows what spiders have visited, and which pages on which days. Problem is it's slower than molasses on a cold winter morning!

What other packages do people use that can give this kind of what did the spiders crawl today type of functionality?

#2 Jill

Jill

    High Rankings Advisor

  • Admin
  • 32,379 posts

Posted 29 September 2003 - 03:09 PM

Vertster, didn't we cover this over in this thread? Or are you talking about something different?

Jill

#3 Vertster

Vertster

    Google wristbar installed

  • Active Members
  • PipPipPipPipPip
  • 327 posts
  • Location:Salt Lake City, Utah

Posted 29 September 2003 - 03:26 PM

Hmmm... well that thread was similar. But I am talking about specifically tracking spiders, and NOT humans. Clicktracks is so useful and fast, because it first strips out all the non-human and file entries in the log files, and analyzes what is left. Tracking spiders is useful, because you can identify patterns in how quickly pages are being added into the SERPs, and so forth. The app I mentioned works the opposite of Clicktracks. It strips everything out of the log files, except the spiders. Then it gives you a report that only shows what the spiders are doing. Useful data, for many reasons, but my favorite is playing "prophet" with my clients and saying "Your site will appear on Google in 6 days!" with some amount of accuracy. We have a lot of sites that Googlebot is spidering every single day and adding new pages within 48-72 hours.

#4 sarahk

sarahk

    HR 1

  • Members
  • Pip
  • 7 posts

Posted 02 November 2003 - 02:16 AM

Hi Verster

there are a few scripts about. Mines the best (of course). Have a look at www.pcpropertymanager.com/botspotter/home.php. It's free so there's no problem with trying it to see if it suits you.

#5 merrick_lozano

merrick_lozano

    HR 2

  • Active Members
  • PipPip
  • 12 posts

Posted 02 November 2003 - 01:16 PM

Hi everyone,

my first post here but some of you may recognize me from other forums.

I have tried out some open source software on several sites and programmed my own for PR Leap. If your server can handle PHP and mySQL you might like the following scripts.

RobotStats
------------
If you just want to track robots try using RobotStats, it is open source and used to be known as Google Stats. Although I have not tested it out after it changed to RobotStats the GoogleStats version was pretty robust. Unfortunately the demo and screenshots are not up right now but this program tracks all of the major search engines and let you see when they came, where they went and what other's were present at the same time.

phpOpenTracker
-------------------
This software is not dedicated to Robots and will only show them as having showed up if you run the scripts on the same site and do not use the webbug feature. Also beware of the frequent database schema changes, Sebastian (also known for his php SOAP Google API) likes to optimize his code and will change the database structure about once every 6 months.

#6 Scottie

Scottie

    Psycho Mom

  • Admin
  • 6,294 posts
  • Location:Columbia, SC

Posted 02 November 2003 - 07:14 PM

Welcome. Merrick! :rant:

#7 Jill

Jill

    High Rankings Advisor

  • Admin
  • 32,379 posts

Posted 02 November 2003 - 07:48 PM

Welcome, Merrick! ;)

Jill

#8 don1

don1

    HR 4

  • Active Members
  • PipPipPipPip
  • 173 posts
  • Location:Marlborough, MA

Posted 02 November 2003 - 08:06 PM

I'm listening Vertster. I want to check if the spiders are hitting some new pages on my site. I know they are hitting the index. Are you saying Clicktracks is the answer? What about this RobotStats? I can analyze humans quite effectively with Urchin but just want to check up on the robots. How is it that you have them coming back every day? Are you setting the revisit after meta to 1 day?

Edit: My host does not have MySQL on the server so that is my limiting factor.

Edited by don1, 02 November 2003 - 08:26 PM.


#9 merrick_lozano

merrick_lozano

    HR 2

  • Active Members
  • PipPip
  • 12 posts

Posted 03 November 2003 - 03:05 AM

Welcome, Merrick! ;)

Jill

Thanks Jill and scottiecl Glad to be here.

#10 Matt B

Matt B

    The modem is the message.

  • Active Members
  • PipPipPipPipPipPip
  • 558 posts
  • Location:Canton, OH

Posted 03 November 2003 - 09:46 AM

So Far, NetTracker has shown some of the best bot tracking I've seen in a product. It allows you to see most visits/page requests, and see what pages were requested by each bot in each session. Pretty impressive stuff.

#11 Vertster

Vertster

    Google wristbar installed

  • Active Members
  • PipPipPipPipPip
  • 327 posts
  • Location:Salt Lake City, Utah

Posted 03 November 2003 - 11:55 AM

This new Robots.txt editor thing that I got is great for checking out just the bots. It has a large database of known robots, and shows you exactly what they are doing on your site. You can view reports by:

Date- what bots hit your site each day.
Robot- what days did each robot hit your site.
Page of your site- Which robots have crawled a particular page on your site.

It also will help you build and maintain a robots.txt file that is syntactically correct. And the best part may be the price- only 50 bucks or so.

Netracker starts at a grand (for only 5 sites,) and goes up from there. My biggest beef about Nettracker after trying it out is that its slow and the interface has some real usability problems (IMO.) Its no worse that the other web based stats packages though.

#12 Vertster

Vertster

    Google wristbar installed

  • Active Members
  • PipPipPipPipPip
  • 327 posts
  • Location:Salt Lake City, Utah

Posted 03 November 2003 - 11:56 AM

And don1, Clicktracks will NOT show you spider activity. It is strictly a user behavior analyzer... and it is very good at that.

#13 don1

don1

    HR 4

  • Active Members
  • PipPipPipPip
  • 173 posts
  • Location:Marlborough, MA

Posted 03 November 2003 - 12:34 PM

Stink! robots.txt only runs on Windoze ;) I like this concept though. If I could track where those spiders are going or more specifically, not going, I could tweek my site even more. :wub:

#14 Matt B

Matt B

    The modem is the message.

  • Active Members
  • PipPipPipPipPipPip
  • 558 posts
  • Location:Canton, OH

Posted 03 November 2003 - 01:08 PM

Stink! robots.txt only runs on Windoze ;) I like this concept though. If I could track where those spiders are going or more specifically, not going, I could tweek my site even more. :wub:

I'm not following you on that. Why would you tweak your site for spiders?

The spiders are only going to follow the links on your site, tweaking won't make much difference in your rankings or conversion rate.

It is the visitors that you should be tweaking for, the spiders will always be there, and will always come back, your visitors may not.

#15 sarahk

sarahk

    HR 1

  • Members
  • Pip
  • 7 posts

Posted 03 November 2003 - 01:47 PM

Hi Matt B

You shouldn't be tweaking the site for the engines, beyond the simple (dare I say ethical) coding guidelines.

However the scenarios that I have tried to answer (with Botspotter) is

Scenario 1. I've changed a page to include new content, guidelines, info etc about a topic of great interest. I expect it to rank well in the engines and drive visitors to my site who I can then convert into sales.

Two weeks later I'm not getting any more hits, sales and my SERPs tool says I'm still at #4567. I change my page again and sack the original copywriter.

BUT do I know if the page has been hit by ANY search engines? If they haven't been then I need to hold fire and wait a bit longer before deciding if the change was worthwhile.

Scenario 2. I'm getting emails and seeing info about Site Submission tools, I can pay alot of money to buy the software but then I read on the forums that I should only submit, manually, to the main engines and the others will find me. I'm new to all this and the passive approach scares me. However spamming Google etc and getting banned is just as scary. If only I could see easily which pages were being indexed and by which engine...


I've been in both those positions and having a tool that tells me what's going on is fantastic. There are alot of bots around that aren't worth worrying about which is why I prioritise them.

Another benefit I didn't expect to get is finding pages turning up on my list with malformed urls. I use Xenu to check the pages so I was able to find them again but and identify how they were caused and fix them. I was also able to see that people use a FrontPage bot to hit my site ALOT and I was able to use my .htaccess to give them content - albeit not what they wanted.

Maybe others could post their reasons for wanting to track the spiders...

Sarah




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users