| Important Announcement: ***Need an Affordable SEO Website Review?*** |
![]() ![]() |
Apr 24 2008, 10:15 AM
Post
#1
|
|
|
HR 4 ![]() ![]() ![]() ![]() Group: Active Members Posts: 113 Joined: 28-May 04 User's local time: Feb 9 2010, 08:59 PM From: Scotland Member No.: 3,740 |
I notice that my custom 404 page seems to be getting a load of hits. I have checked all the links and found no problems.
I tried downloading the logs and it appears that it is bots that are visiting but I can't work out why they are getting to that page. Have I set it up wrong? In the .htaccess: CODE ErrorDocument 404 http://www.mysite.com/custom404.htm Thanks Jo This post has been edited by Randy: Apr 24 2008, 10:53 AM
Reason for edit: Added code tags.
|
|
|
|
Apr 24 2008, 10:54 AM
Post
#2
|
|
![]() Convert Me! Group: Admin Posts: 17,378 Joined: 17-August 03 User's local time: Feb 9 2010, 01:59 PM Member No.: 551 |
The ErrorDocument instruction looks fine.
You'd need to look deeper. Do your log files show any referring url for the 404 errors? And is this the major bots hitting a bad spot? Or simply one of the many, many spam bots out there that are up to no good? |
|
|
|
Apr 24 2008, 12:16 PM
Post
#3
|
|
|
HR 4 ![]() ![]() ![]() ![]() Group: Active Members Posts: 113 Joined: 28-May 04 User's local time: Feb 9 2010, 08:59 PM From: Scotland Member No.: 3,740 |
The ErrorDocument instruction looks fine. You'd need to look deeper. Do your log files show any referring url for the 404 errors? And is this the major bots hitting a bad spot? Or simply one of the many, many spam bots out there that are up to no good? Hi Randy! mlbot - mean anything? Jo |
|
|
|
Apr 24 2008, 06:49 PM
Post
#4
|
|
![]() Convert Me! Group: Admin Posts: 17,378 Joined: 17-August 03 User's local time: Feb 9 2010, 01:59 PM Member No.: 551 |
It's the one getting the 404?
MLbot is a relatively new one which is supposed to be robots.txt friendly, though I've honestly not seen it in my logs much. I did see it once or twice and read the page it gives as the info page in the server logs. If memory serves that said it was a spider that's trying to index Media, not web pages. I haven't tried simply blocking it via robots.txt since I don't have any media files so there wasn't really anything for it to index in the first place. I'd try excluding that one right from robots.txt, which should get rid of the 404 hits too. |
|
|
|
Apr 25 2008, 11:03 AM
Post
#5
|
|
![]() HR 6 ![]() ![]() ![]() ![]() ![]() ![]() Group: Active Members Posts: 848 Joined: 21-November 05 User's local time: Feb 9 2010, 07:59 PM From: Ogmore-by-Sea, Wales, UK Member No.: 9,487 |
It's probably not the problem, but have you verified that your 404 page actually returns a 404 code?
|
|
|
|
Apr 25 2008, 10:22 PM
Post
#6
|
|
![]() Lost in Translation Group: Moderator Posts: 2,202 Joined: 5-August 03 User's local time: Feb 10 2010, 06:59 AM From: Sydney Australia Member No.: 283 |
AFAIK, adding the http:// redirects to that URL (with a 302 AFAIK). I THINK, but you'll need to test, that
CODE ErrorDocument 404 /custom404.htm Works better. see http://httpd.apache.org/docs/1.3/mod/core.html#errordocument |
|
|
|
Apr 27 2008, 05:46 AM
Post
#7
|
|
|
HR 4 ![]() ![]() ![]() ![]() Group: Active Members Posts: 113 Joined: 28-May 04 User's local time: Feb 9 2010, 08:59 PM From: Scotland Member No.: 3,740 |
Aha, that is very interesting. I will definitely take out the absolute URL then, but before I do...
I have had another look at the logs and am very puzzled by something. I found a dodgy pdf that was no longer there which was triggering a few hits to www.mysite.com/custom404.htm BUT also there are a load of these (which must be bumping up the stats): GET /custom-404.htm which is returning a 200 code Does that mean that somewhere I have a link to that file that the bots are finding? I just have the custom-404 file in the root but as far as I can see there are no links to it other than the absolute URL within the htaccess. Thanks for all your help so far Jo |
|
|
|
Apr 27 2008, 09:37 AM
Post
#8
|
|
![]() High Rankings Advisor Group: Admin Posts: 29,201 Joined: 21-July 03 User's local time: Feb 9 2010, 02:59 PM From: Ashland, MA Member No.: 2 |
Your 404 may not be returning a 404 header response, but a 200 ok one instead. You'll definitely want to make sure it's returning an actual 404 so that it doesn't get indexed (under multiple URLs no less!).
|
|
|
|
Apr 27 2008, 08:44 PM
Post
#9
|
|
![]() Lost in Translation Group: Moderator Posts: 2,202 Joined: 5-August 03 User's local time: Feb 10 2010, 06:59 AM From: Sydney Australia Member No.: 283 |
Get Webbug for that.
|
|
|
|
Apr 28 2008, 03:57 PM
Post
#10
|
|
|
HR 4 ![]() ![]() ![]() ![]() Group: Active Members Posts: 113 Joined: 28-May 04 User's local time: Feb 9 2010, 08:59 PM From: Scotland Member No.: 3,740 |
You are dead right - it is returning a 200. Not only that, a made-up url (ie a page that does not exist) is returning a 302!
Time to get on to the hosting company? Jo |
|
|
|
Apr 28 2008, 03:59 PM
Post
#11
|
|
![]() High Rankings Advisor Group: Admin Posts: 29,201 Joined: 21-July 03 User's local time: Feb 9 2010, 02:59 PM From: Ashland, MA Member No.: 2 |
QUOTE Time to get on to the hosting company? Yes, if they have control of your 404 error page, and .htaccess file, etc. |
|
|
|
Apr 28 2008, 04:30 PM
Post
#12
|
|
![]() Convert Me! Group: Admin Posts: 17,378 Joined: 17-August 03 User's local time: Feb 9 2010, 01:59 PM Member No.: 551 |
Did you change the ErrorDocument instruction in your .htaccess in case yours is one of those servers that automatically delivers a 302 if the full url is given?
If not, I'd try that first following the example given by projectphp above. Then re-test a non-existent url address again. If you still get a 302 it'll be time to get on the host. They may have something in the virtual host configs that we can neither see, nor change. |
|
|
|
![]() ![]() |
|
Lo-Fi Version | Time is now: 9th February 2010 - 02:59 PM |