Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!



Photo
- - - - -

Can I Clean Up My .htaccess?


  • Please log in to reply
14 replies to this topic

#1 luv2learn

luv2learn

    HR 2

  • Members
  • PipPip
  • 11 posts

Posted 25 October 2009 - 10:12 AM

How long should I keep 301's in my .htaccess file?

99% of my 301's are where we've changed the file name of the page (and the majority of those before we understood much about how to build a site correctly).

The old file names in question were not being linked to from external sites, just our internal structure before the changes, so after I've made webmaster tools aware of the change - can I safely delete the 301's?

#2 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 25 October 2009 - 11:27 AM

You'll probably want to check your log files before removing them luv2learn. On a standard *nix/Apache system you can see them in the access_log file by searching it for 301 status codes. The status code will be a three digit number that comes just after the HTTP version information. So an example of a 301'd hit might look something like: (the bolded part is the status code)

QUOTE
65.55.105.122 - - [25/Oct/2009:06:05:18 -0400] "GET /folder/somepage.php HTTP/1.1" 301 619 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"


That's a sort of real one from one of my sites, for a page that was redirected almost two years ago that MSNbot continues to try to access. It never had anything other than a couple of internal links pointing at it too.

I'll note that having a clean .htaccess is a good thing. But by the same token if nothing ever tries to hit those MIA pages it's not really costing you much if any overhead. Unless you have hundreds of them. In which case you may even want to consider getting them out of your .htaccess and performing a scripted redirect instead.

#3 luv2learn

luv2learn

    HR 2

  • Members
  • PipPip
  • 11 posts

Posted 25 October 2009 - 12:35 PM

Thanks randy,

I don't have hundreds so I guess it's not that great an issue then - just when I go in there to add another I'm always afraid of messing up the .htaccess some way because there's quite a few in there now.

Cheers for the heads up on the access log. I've looked at it before but don't truly understand what it's about but seeing your example helps makes a little sense. I'm gonna check every so often, as you say - a couple years down the line and MSN are still trying to see.

I love learning new things all about this stuff - cheers randy.


#4 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 25 October 2009 - 08:46 PM

Well, if you want to know what all of that stuff in the line above means:
  • 65.55.105.122 - This is the IP number of the connecting computer.
  • - - - Nothing. If however you had .htaccess user logins set up and someone was logged in you'd see their username show up here.
  • [25/Oct/2009:06:05:18 -0400] - That's the date and time stamp from my server
  • GET - That's the HTTP protocol being used. Most will be get. Though if you have a post method form set up it would show as POST instead of GET.
  • /folder/somepage.php - That's the page address being requested. It'll show variables too if someone is requesting a dynamic page with variables in the url.
  • HTTP/1.1 - The version of the HTTP request. Most will be HTTP Ver. 1.1 these days. HTTP 1.0 was buggy as all get out with some things, like error status codes. So it's a good thing for the bots to now use 1.1.
  • 619 - That's the size of the page, or its weight. Just the code only (you'll see separate hits for images and such that are embedded in the page.) It's in bytes. In my case my custom 301 "page" that no one actually sees is apparently 619 bytes in size.
  • - - Nothing in this case. But if someone came from say Google via a search, or even if someone clicked through from another page of my site this is where you'd see the previous page Referral information. Including search phrase information if you know what to look for in the query string of the Google/Yahoo/Bing/etc referral urls.
  • msnbot/1.1 (+http://search.msn.com/msnbot.htm) - This is the user agent information. In this case it was a spider hit from MSN/Live/Bing's crawler. If it was a normal surfer using say Internet Explorer or Firefox you'd see some basic information about the user's browser here.
See, more than you ever wanted to know about what all of that stuff means. giggle.gif But there is all sorts of info hidden away in log files if one ever needs to know it.

#5 luv2learn

luv2learn

    HR 2

  • Members
  • PipPip
  • 11 posts

Posted 26 October 2009 - 12:23 PM

Very interesting randy,

Had a look into my error log - HUGE!
But from your examples I can see it's google requesting the information, but still requesting the old files. Not sure when they'll stop and start requesting the correct ones but I live & learn.

Cheers

#6 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 26 October 2009 - 02:06 PM

If Google is still requesting the old pages and they've been redirected for awhile you'll probably want to double check the status code on the redirects. Most servers default to a 302 instead of a 301 if you do not expressly declare they should be 301's. And you want them to be 301's.

As a very general rule Google seems to pick up on proper 301 redirects and stop going to the old MIA pages in fairly short order. So if it's been more than a few weeks it might indicate something else going on. The only time I've seen Googlebot keep going to MIA pages that have been properly redirected are those cases where there are still links pointing to the page in question from somewhere.

#7 luv2learn

luv2learn

    HR 2

  • Members
  • PipPip
  • 11 posts

Posted 26 October 2009 - 02:57 PM

To be fair my recent 301's were only done a couple of weeks ago so I'm probably not giving it enough time yet as you say.
My logs have many "justaninstaller" logged - is that anything to worry about?

#8 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 26 October 2009 - 03:37 PM

No clue what that one is. Is it showing up in the User Agent section at the end?

I've certainly never heard of it before. Which makes me wonder if it's not something that's specific to some CMS or other software you have installed, or perhaps something your hosting company has running.

#9 luv2learn

luv2learn

    HR 2

  • Members
  • PipPip
  • 11 posts

Posted 27 October 2009 - 04:43 AM

QUOTE
example.co.uk 79.170.40.8 - - [27/Oct/2009:09:39:20 +0000] "GET http ://example.co.uk/ftpkjHx0v.cgi HTTP/1.1" 301 311 "-" "JustAnInstaller/0.1"


That's the line which appears in the error log. I don't know what it is but I think it may be connected to my host in some way.

#10 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 27 October 2009 - 09:27 AM

Just wanna confirm you're seeing this in your error log. Is that part correct?

If so, they're not finding what they're looking for. The fact that they're looking for a cgi file (which normally has Execute rights) to me would be a strong hint that it's a script kiddie looking for a weakness. I see that kind of stuff all the time because I do watch log files a lot more than most. More times than not they're looking for a way to set up a spam bot on your server.

From a server security perspective I'd be a lot more concerned if it were finding and executing files that were actually there. All they're doing right now if they're not finding anything is burning up a small bit of bandwidth. Can you block them? Yes. Is it worth doing? Sometimes yes, most times no. It sort of depends upon the specifics.

If you wanted to block 'em by the JustAnInstaller user agent in your .htaccess that would look like

CODE
RewriteCond %{HTTP_USER_AGENT} ^JustAnInstaller
RewriteRule ^.* - [F,L]


You'll still see them in your error log, but they'll get a 403 Forbidden response code on the other end.

#11 luv2learn

luv2learn

    HR 2

  • Members
  • PipPip
  • 11 posts

Posted 27 October 2009 - 06:12 PM

Actually I must have been mistaken. Just been back to look and it's not in the error log but the access log which is more worrying for me.

I don't have any files that match that I can see either.

#12 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 27 October 2009 - 08:22 PM

Well they're getting a 301 status, which means the requests are being redirected somewhere else. Do you see another entry for the same IP number right after these showing where they ended up? Or do you 301 redirect all errors? Or do you have a non-www redirect set up? (I notice the original request is non-www, but don't know if you edited that out or not.)

If they're not getting to a real file it's probably nothing to worry about. That said, it's never a bad idea to question these odd things when you notice them showing up.

#13 luv2learn

luv2learn

    HR 2

  • Members
  • PipPip
  • 11 posts

Posted 30 October 2009 - 11:42 AM

Did some digging past couple of days randy,

Seems to be connected with my host who've assured me I needn't worry. Thanks for all the insights!

#14 Hank Cowdog

Hank Cowdog

    HR 4

  • Active Members
  • PipPipPipPip
  • 113 posts
  • Location:Chair, Den, Wylie, Outside Dallas, Texas, USA, North America, Earth, Sol, Milky Way

Posted 03 November 2009 - 03:31 PM

QUOTE(luv2learn @ Oct 27 2009, 06:43 AM) View Post
example.co.uk 79.170.40.8 - - [27/Oct/2009:09:39:20 +0000] "GET http ://example.co.uk/ftpkjHx0v.cgi HTTP/1.1" 301 311 "-" "JustAnInstaller/0.1"


Why does this generate a 301 instead of a 404?

If this is just some random page request, and not a legit request for an old URL from the site, I would worry that your redirection rules are too broad.

You should redirect old content to new locations, but you should not redirect ALL requests to some new location. If the URL NEVER existed, it should generate a 404 error, not be redirected via 301.

Overuse of 301's can prevent the search engines from pruning invalid URLs from their databases, lead to all kinds of duplicate content issues, and is lazy coding nerd.gif.

#15 luv2learn

luv2learn

    HR 2

  • Members
  • PipPip
  • 11 posts

Posted 05 November 2009 - 09:45 AM

Hi Hank,

I won't begin to understand everything my host told me other than 'justaninstaller' is used when I install one of the many scripts for the first time through my control panel. While I don't go script mad, there may have been a couple of times I tried one of their scripts. I think the last one I tried was an auto set-up of wordpress.

Other than that I'm not too great in understanding the ins & outs of the log, but they did say I had nothing to worry about crossfingers.gif




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

We are now a read-only forum.
 
No new posts or registrations allowed.