Are you a Google Analytics enthusiast?
Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE!

www.CustomReportSharing.com
From the folks who brought you High Rankings!
More SEO Content
Large Website File Structure- What's Best For Se's?
#1
Posted 02 January 2007 - 08:05 PM
I operate a commerce site that has grown beyond our original plans. We sell items that when sold, are never taken off the site for purposes of research by their owners and third-parties. The site is heavily linked to. Problem is, years ago, our file structure on the server was such that all htm files were placed in root and all images were placed in a directory called 'images'. Now, we have over 4000 pages so there are that many files in root (I can see you all wincing now...) and even more image files under their separate image directory. Problem is, if I restructure the site and move things, I risk losing a serious number of outside sites that have linked to us as well as it taking an extraordinary amount of time to redo everything. The futures means it is going to grow even more so I plan to put new files in a better structure and leave the old where they are.
Knowing now that it's bad to have so many files under a directory and root especially, I have two main questions.
1 - Is it bad to have too many DIRECTORIES from root for a search engine to spider the site with ease?
2 - Is it best to structure the site with subdirectories by keeping image files and htm files separate in their own subdirectories rather than mixing in one directory?
I am trying to figure out what file structure on the server is most friendly to a search engine when spidering. (East of customer navigation is not the issue here, just how we set up the site on the disk).
Thanks
#2
Posted 02 January 2007 - 08:17 PM
1 - Is it bad to have too many DIRECTORIES from root for a search engine to spider the site with ease?
2 - Is it best to structure the site with subdirectories by keeping image files and htm files separate in their own subdirectories rather than mixing in one directory?
It's not bad for spiders at all. They don't care a whit about your file structure- I've had pages that were indexed and ranked well that appeared to be seven folders deep.
Structure your files how it makes sense for YOU. I can't imagine trying to find something in a folder with 8000 items! But there's nothing inherently wrong about it from a search engine point of view.
Personally, I like to give images their own folder but it's a housekeeping thing and certainly has no advantages either way when it comes to SEO.
What search engines care about is how many links it takes them to get to the page.
Example:
www.domain.com/file.htm
www.domain.com/folder/anotherfolder/thirdfolder/fourthfolder/file.htm
are both linked from the home page- they both get the same "weight" or vote from the home page.
However,
www.domain.com/somefile.htm,
even though it's in the root directory, isn't linked from the home page. The user has to click
www.domain.com/category/ then
www.domain.com/subcategory/
to reach it, so it's 3 clicks away from the home page. It may be many visits before a search engine spider ever reaches that page (because they follow every link on prior pages) and it's linked from a lower importance page, so they naturally determine that it's not as important as the other pages on your site.
Does that make sense?
Customers and search engine spiders navigate your site in similar ways, so if you create a structure that easily allows people to navigate your site, it works for the engines too. The actual structure of the URL/folders really has no bearing on anything (other than your own sanity when trying to find something!)
#3
Posted 02 January 2007 - 08:28 PM
In my case, I actually have links on the home page in a Frontpage shared border that may entail several clicks from the home page to the final page. I also have each of these items linked in a "recent additions" page (that is one click from home) when placed on the site initially. The SE's love my site but I don't want to change that slowly over time by causing crawlers undue issues compared to the sites simplicity now.
#4
Posted 02 January 2007 - 08:39 PM
#5
Posted 02 January 2007 - 09:01 PM
Are shared borders in frames? If not, then they are probably fine. I would think these days they use include files, which assemble the page before sending it to the browser- in which case, the end result is a single page which is fine.
Simplicity is a good thing, no matter how you cut it. I don't think your server cares about the number of files in a folder either, to be honest. If things are going well and you don't mind the mass of files in the root... I'd leave well enough alone.
As a long term strategy though, I think you'd do well to have a better organizational structure just for your own needs in finding and editing, etc.
You should also probably consider a site map, linked from the footer. It's a good way to get all your pages accessible, no matter where they are located in your site structure.
#6
Posted 02 January 2007 - 09:14 PM
Indexing thousands of files in a single directory on a drive is not as fast for a server compared to having them chopped up in subdirectories. I have noticed this myself on my own drive. I find files easily by clicking on the list and just typing in the first few letters. All my htm and image files are coded by the catalog number so it's easy to find them this way.
I don't know if having a ton of directories is the same as a ton of files in a directory. If anyone else knows this from a file retrieval speed issue, please let me know. I guess from a SE perspective, it doesn't matter.
#7
Posted 02 January 2007 - 10:23 PM
It's not bad at all. If you change things now, you will have many months of pain. So unless you absolutely positively have to change your URLs, I would highly recommend against doing it.
#8
Posted 02 January 2007 - 11:18 PM
Hi Jill,
I was informed by one of techies that hosts the site that so many files (4000+) in my root directory as well as any directory, was bad for indexing by the server as far as efficiently finding a file for a request. I was not going to move anything but just start from today and begin a more efficient file structure. My main concern was if there was some structure that spiders dont like over others.
thanks
#9
Posted 02 January 2007 - 11:37 PM
I don't believe that is true.
#10
Posted 03 January 2007 - 12:55 AM
Unix-type systems, just like Microsoft ones, can go either way. I haven't checked in a few years, but last time I did, Red Hat Linux was still using the older linked listed. Randy might be able to corroborate that? Since Linux itself uses a Virtual File System other vendors can very easily connect to other, potentially better, file systems.
In a linked list, an entry is an entry is an entry. So, yes, directories are the same as files. Whether your directory holds 10K files or 10K sub-directories, the effects on (in)efficiency are the same.
You are NOT necessarily going to get a lot more efficiency out of a more complicated structure. Some, but not a lot.
If you have to find a file in a single directory with N entries, as we already said, it's going to take on average N/2 reads to do it. If you bury that file three directories deep, it's still going to take X/2 + Y/2 + Z/2 reads, where the variables represent the number of entries in each of the three directory levels. Unless you actually create a directory structure that resembles a binary tree, something that isn't always easily done, you're not going to gain a lot. Some, but not a lot.
That being the case, I agree with Jill, you certainly aren't going to gain enough to offset what you will definitely be losing with already established URLs. My advice would be to do it only on a go-forward basis and "try" to design a structure with multiple branches (nodes) at every level. You can use the formulas above (N/2 and X/2 + Y/2 + ...) to get a feel for how much efficiency you are gaining.
p.s. On a modern computer, 4K files in a single directory is a lot, but it's not really a LOT. I frequently run applications that have 10K files in a directory. You've probably picked a very good time to start addressing the problem because, in my opinion, you're just entering the edge of concern.
#11
Posted 03 January 2007 - 08:21 AM
It was as of RHEL 3 Ron. In fact, there was a bug that needed to be addressed with regard to corrupted link lists in 3 if memory serves. I've not checked RHEL 4, but would suspect that it too uses linked lists. Such an alteration of the base code would have been a major announcement in the Changelog if they'd changed it, and I haven't seen that one floating around anywhere.
FTR, I agree completely with all of Ron's conclusions.
#12
Posted 08 January 2007 - 03:34 PM
#13
Posted 21 January 2007 - 05:50 PM
You said in your statement:
It's not bad at all. If you change things now, you will have many months of pain. So unless you absolutely positively have to change your URLs, I would highly recommend against doing it.
How long a period of time are you talking about.
Also could I have left the home page .default.asp instead of changing it to the now .html
#14
Posted 22 January 2007 - 01:18 PM
Can any one help with my question. My original site was built with software and the home page had a URL like domainname.com/default.asp and I have now changed it to domainname.com/index.html
Should I (and could I) have left the home page name as it was and change the others to html static pages
How long does it usually take G to sort this out and assign page rank. Is there anything I can do.
Perkle
#15
Posted 22 January 2007 - 04:05 PM
You'll probably want to 301 the default.asp page over to the index.html page for starters, just in case anyone (or even you) were linking to it with the page filename as part of the link. If all of the rest of the page names were different as well, you can also redirect them to the most appropriate new page.
The rest is going to depend upon how long ago you changed the structure. You don't want to be flip flopping back and forth too much because you'll just confuse the spiders and start the re-indexing process all over again.
Changing page names can be painful in the short term, but isn't nearly as bad as changing the domain name.
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users








