Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!


Sponsored Content

 

 
 

Photo
- - - - -

Too Many Hits For My Custom 404 Page


  • Please log in to reply
11 replies to this topic

#1 johking

johking

    HR 4

  • Active Members
  • PipPipPipPip
  • 116 posts
  • Location:Scotland

Posted 24 April 2008 - 10:15 AM

I notice that my custom 404 page seems to be getting a load of hits. I have checked all the links and found no problems.

I tried downloading the logs and it appears that it is bots that are visiting but I can't work out why they are getting to that page.

Have I set it up wrong?

In the .htaccess:
CODE
ErrorDocument 404 http://www.mysite.com/custom404.htm


Thanks

Jo

Edited by Randy, 24 April 2008 - 10:53 AM.
Added code tags.


#2 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 24 April 2008 - 10:54 AM

The ErrorDocument instruction looks fine.

You'd need to look deeper. Do your log files show any referring url for the 404 errors? And is this the major bots hitting a bad spot? Or simply one of the many, many spam bots out there that are up to no good?

#3 johking

johking

    HR 4

  • Active Members
  • PipPipPipPip
  • 116 posts
  • Location:Scotland

Posted 24 April 2008 - 12:16 PM

QUOTE(Randy @ Apr 24 2008, 04:54 PM) View Post
The ErrorDocument instruction looks fine.

You'd need to look deeper. Do your log files show any referring url for the 404 errors? And is this the major bots hitting a bad spot? Or simply one of the many, many spam bots out there that are up to no good?


Hi Randy!

mlbot - mean anything?

Jo

#4 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 24 April 2008 - 06:49 PM

It's the one getting the 404?

MLbot is a relatively new one which is supposed to be robots.txt friendly, though I've honestly not seen it in my logs much. I did see it once or twice and read the page it gives as the info page in the server logs. If memory serves that said it was a spider that's trying to index Media, not web pages. I haven't tried simply blocking it via robots.txt since I don't have any media files so there wasn't really anything for it to index in the first place.

I'd try excluding that one right from robots.txt, which should get rid of the 404 hits too.

#5 MaKa

MaKa

    HR 6

  • Active Members
  • PipPipPipPipPipPip
  • 856 posts
  • Location:Llantwit Major, Wales, UK

Posted 25 April 2008 - 11:03 AM

It's probably not the problem, but have you verified that your 404 page actually returns a 404 code?

#6 projectphp

projectphp

    Lost in Translation

  • Moderator
  • 2,203 posts
  • Location:Sydney Australia

Posted 25 April 2008 - 10:22 PM

AFAIK, adding the http:// redirects to that URL (with a 302 AFAIK). I THINK, but you'll need to test, that
CODE
ErrorDocument 404 /custom404.htm

Works better. see http://httpd.apache....l#errordocument

#7 johking

johking

    HR 4

  • Active Members
  • PipPipPipPip
  • 116 posts
  • Location:Scotland

Posted 27 April 2008 - 05:46 AM

Aha, that is very interesting. I will definitely take out the absolute URL then, but before I do...

I have had another look at the logs and am very puzzled by something.

I found a dodgy pdf that was no longer there which was triggering a few hits to www.mysite.com/custom404.htm

BUT also there are a load of these (which must be bumping up the stats):
GET /custom-404.htm
which is returning a 200 code

Does that mean that somewhere I have a link to that file that the bots are finding? I just have the custom-404 file in the root but as far as I can see there are no links to it other than the absolute URL within the htaccess.

Thanks for all your help so far

Jo

#8 Jill

Jill

    High Rankings Advisor

  • Admin
  • 32,324 posts

Posted 27 April 2008 - 09:37 AM

Your 404 may not be returning a 404 header response, but a 200 ok one instead. You'll definitely want to make sure it's returning an actual 404 so that it doesn't get indexed (under multiple URLs no less!).

#9 projectphp

projectphp

    Lost in Translation

  • Moderator
  • 2,203 posts
  • Location:Sydney Australia

Posted 27 April 2008 - 08:44 PM

Get Webbug for that.

#10 johking

johking

    HR 4

  • Active Members
  • PipPipPipPip
  • 116 posts
  • Location:Scotland

Posted 28 April 2008 - 03:57 PM

You are dead right - it is returning a 200. Not only that, a made-up url (ie a page that does not exist) is returning a 302!

Time to get on to the hosting company?

Jo



#11 Jill

Jill

    High Rankings Advisor

  • Admin
  • 32,324 posts

Posted 28 April 2008 - 03:59 PM

QUOTE
Time to get on to the hosting company?


Yes, if they have control of your 404 error page, and .htaccess file, etc.

#12 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 28 April 2008 - 04:30 PM

Did you change the ErrorDocument instruction in your .htaccess in case yours is one of those servers that automatically delivers a 302 if the full url is given?

If not, I'd try that first following the example given by projectphp above. Then re-test a non-existent url address again.

If you still get a 302 it'll be time to get on the host. They may have something in the virtual host configs that we can neither see, nor change.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users