High Rankings Search Engine Optimization ForumHigh Rankings Advisor Search Marketing Newsletter

Welcome Guest ( Log In | Register )

Important Announcement: ***Need an Affordable SEO Website Review?***
3 Pages V   1 2 3 >  
Reply to this topicStart new topic
> Writing A Search Engine?, I'm going to write my own search engine.
Nathan Malone
post Jun 23 2004, 12:24 PM
Post #1


HR 5
*****

Group: Active Members
Posts: 497
Joined: 24-January 04
User's local time:
Feb 9 2010, 10:56 AM
Member No.: 2,068



Hi, I'm going to write my own search engine (note, there is no "try" in there) and I was wondering if any of you had any suggestions for anything from coding to algorithm to design to anything else. I have Apache, PHP, and MySQL installed on my PC so I was just going to set up a small engine on my pc while I tweak it before expanding it to include the whole web (x years down the road). I'm also going to be using it for a "site-search" feature for my sites. Does anyone have any advice? I already know several things:

1. It will be hard.
2. It will be very hard.
3. It will use up tons of bandwidth, hard drive space, and other server resources.
4. I will not be able to host another "google" on my little pc.
5. To succeed, it will have to be different then the thousands of other small search engines all over the web.
6. It will be spam-free (or as spam-free as I can make it).
Go to the top of the page
 
+Quote Post
Randy
post Jun 23 2004, 01:15 PM
Post #2


Convert Me!
Group Icon

Group: Admin
Posts: 17,377
Joined: 17-August 03
User's local time:
Feb 9 2010, 08:56 AM
Member No.: 551



Good luck with the project Nathan! Nothing to add, other than that it will take you many, many hours.

On the other hand, if you hit on something all of the Phd's have missed to control spam you'll likely be a millionaire practically overnight. (IMG:http://www.highrankings.com/forum/style_emoticons/default/wink.gif)
Go to the top of the page
 
+Quote Post
respree
post Jun 23 2004, 02:30 PM
Post #3


HR 3
***

Group: Active Members
Posts: 87
Joined: 6-February 04
User's local time:
Feb 9 2010, 10:56 AM
From: Los Angeles, CA
Member No.: 2,401



I would recommend developing a long-term business plan.

There are two main issues I think you'll need to think long and hard about.

- Given the thousands of smaller search engines already out there, what features will your product have that the others do not. To achieve any level of success, you'll need a unique selling proposition.

- Do you have a clear vision of a business model that will support the costs associated with indexing, storing and serving up the 'world's information?'

Not my intention to discourage you, but just offering you some issues to consider. Good luck in you endeavor.

This post has been edited by respree: Jun 23 2004, 02:59 PM
Go to the top of the page
 
+Quote Post
Haystack
post Jun 23 2004, 02:51 PM
Post #4


HR 7
Group Icon

Group: Moderator
Posts: 1,980
Joined: 24-July 03
User's local time:
Feb 9 2010, 08:56 AM
From: Minneapolis, MN
Member No.: 16



Good luck, Farmernate. You might not be willing to share this, but I'm curious to know if you have some fresh ideas on how to determine relevancy of pages.
Go to the top of the page
 
+Quote Post
Nathan Malone
post Jun 23 2004, 03:30 PM
Post #5


HR 5
*****

Group: Active Members
Posts: 497
Joined: 24-January 04
User's local time:
Feb 9 2010, 10:56 AM
Member No.: 2,068



Well, yes. I was thinking about it last night for several hours and I thought of three new things I could do, which will remain unposted for now (sorry!). What I really need to do though is simply make a small test engine that would index only one or two hundred sites and then I could use that small database to tweak it and see if it works. If it doesn't, then oh well and if it does, then I just might make the next Google. Ods are stacked against me doing that, of course, but it never hurts to try. That's how Google got started and that just might be how I'll get started.

Anyway, thanks everyone for the advice and encouragement and I'll be sure to post here when I have a functional search engine up. It'll be several months at least before I'm finished tweaking it so don't expect it anytime soon. If all else fails (and, knowing the ods, it probably will), then I can at least use it as a site-search for my websites as I have/am developing eleven sites, seven of which have the potential to become very big sites.

Well, off I go to review my Apache books and brush up on php...
Go to the top of the page
 
+Quote Post
Haystack
post Jun 23 2004, 04:34 PM
Post #6


HR 7
Group Icon

Group: Moderator
Posts: 1,980
Joined: 24-July 03
User's local time:
Feb 9 2010, 08:56 AM
From: Minneapolis, MN
Member No.: 16



This might help:
http://www.google.com/programming-contest/
Go to the top of the page
 
+Quote Post
Scottie
post Jun 23 2004, 05:00 PM
Post #7


Psycho Mom
Group Icon

Group: Admin
Posts: 6,124
Joined: 21-July 03
User's local time:
Feb 9 2010, 10:56 AM
From: Columbia, SC
Member No.: 3



Go Nathan! Let us know when you have something we can play with. (IMG:http://www.highrankings.com/forum/style_emoticons/default/wink.gif)
Go to the top of the page
 
+Quote Post
rohgan03
post Jun 23 2004, 05:06 PM
Post #8


HR 6
******

Group: Active Members
Posts: 944
Joined: 28-January 04
User's local time:
Feb 9 2010, 10:56 AM
Member No.: 2,123



Way to go....nice to see someone thinking differently..hope you succeed
Go to the top of the page
 
+Quote Post
Nathan Malone
post Jun 23 2004, 07:14 PM
Post #9


HR 5
*****

Group: Active Members
Posts: 497
Joined: 24-January 04
User's local time:
Feb 9 2010, 10:56 AM
Member No.: 2,068



Thanks for the encouragement. I now know how I'm going to build it, I just need to think of how I'll organize the database and then work on the exact algorithm. I have what I think is a good base algorithm (like pagerank is for Google) but of course, I will also need the standard stuff. We'll see how it goes...
Go to the top of the page
 
+Quote Post
bauguss
post Jun 24 2004, 02:58 PM
Post #10


HR 2
**

Group: Active Members
Posts: 42
Joined: 11-November 03
User's local time:
Feb 9 2010, 10:56 AM
Member No.: 1,291



coding wise

PHP is wonderful. It is my favorite language. The school side of me says doing a search engine (that is meant to be competitive with the likes of google someday) in php is not a good choice. Too much on the performance side of things. Using raw C or C++ is just so much more optimizable.

However, doing your prototype in PHP is definitely a good option.

Now, the PHP lover side of me says PHP would be an awesome choice. A LAMP system (Linux, Apache, MySQL, PHP) is extremely scalable across many machines with the ability to create a kind of intelligence where each individual node is aware of the overall system. Not only that, the servers don't even need to be in the same location. Done right, you could have an installable app that once installed becomes instantly part of the network. (not to mention, PHP is basically a scripting language on top of C anyway...everything you do in PHP ultimately runs C bytecode...and...you could also get a PHP compiler to turn your php code into bytecode...it gets better and better)

I do have an idea for you as well. Why couldn't a search engine behave similiar to Slashdot. Where you have moderators, meta moderators, and then regular users (who typically do the meta moderating to keep the moderators in check) Kinda like DMOZ except better (where an editor can quickly become too powerful or too overworked) Then the listings all have a rating system where all users can have input into the usefulness of the listing.

This technique combined with others like page rank should be able to balance each other out really nicely. (take a look at a slashdot post and look at all the comments rated. Some are things like "Interesting" to "Flame Bait" to "Redundant"

And good luck. I've thought about this myself but it is always about time.

Oh, another idea is simply using googles existing results and applying your own additional ideas to it. You could start with the Google API and if you start to come up with something that becomes usable, get a google license. (see a9.com by Amazon)

Back to LAMP, you may want to write down in your plans to become an expert at compiling the 3 programs. There is a lot you can do to optimize compilation to create a faster more robust php. Also in your long term plans could be to make your own PHP distribution where you gut out all the stuff that just isn't needed for the search engine. (it has tons of functions you will never use...some you can control through compile options, others are built in)

Josh
Go to the top of the page
 
+Quote Post
Nathan Malone
post Jun 24 2004, 05:04 PM
Post #11


HR 5
*****

Group: Active Members
Posts: 497
Joined: 24-January 04
User's local time:
Feb 9 2010, 10:56 AM
Member No.: 2,068



Woa! Thanks for all the great information, Josh! I do have a few questions about the above post though:

QUOTE
PHP is wonderful. It is my favorite language. The school side of me says doing a search engine (that is meant to be competitive with the likes of google someday) in php is not a good choice. Too much on the performance side of things. Using raw C or C++ is just so much more optimizable.

However, doing your prototype in PHP is definitely a good option.


Yes, I have been wondering about that. I had already decided to build it in php at first and then possibly switch over if I actually develop something that works well enough to warrent it. PHP obviously wouldn't affect the way the algorithm works, it would just affect the performance, which I'm not worried about at this point. Anyway, I guess getting the algorithm just right will be hard enough without having to learn a whole new language just to write the engine in so I guess I'll just write it in php for now and then possibly switch over later if the performance is taking too many hits...

QUOTE
Now, the PHP lover side of me says PHP would be an awesome choice. A LAMP system (Linux, Apache, MySQL, PHP) is extremely scalable across many machines with the ability to create a kind of intelligence where each individual node is aware of the overall system. Not only that, the servers don't even need to be in the same location. Done right, you could have an installable app that once installed becomes instantly part of the network. (not to mention, PHP is basically a scripting language on top of C anyway...everything you do in PHP ultimately runs C bytecode...and...you could also get a PHP compiler to turn your php code into bytecode...it gets better and better)


Yes, I too am a great fan of the "LAMP" combo. Although I know the basics of programming in asp and perl, I think that PHP and MySQL are a lot better and I am much more familiar with them. I do have a question about using MySQL though. A search engine would obviously need to get lots of performance from any database it uses and with the size of the web now, the database would have to be huge to hold all the sites. Would MySQL be able to handle the load or would another database work better? Either way, I'll probably use MySQL for at least the first few versions of the engine but I would be curious to find out whether I will need to switch in the future.

QUOTE
I do have an idea for you as well. Why couldn't a search engine behave similiar to Slashdot. Where you have moderators, meta moderators, and then regular users (who typically do the meta moderating to keep the moderators in check) Kinda like DMOZ except better (where an editor can quickly become too powerful or too overworked) Then the listings all have a rating system where all users can have input into the usefulness of the listing.


I have already been planning on letting ordinary users vote for the search results but the idea of having "moderators" is so much better! I guess that's kind of the way Google works, except they call their moderators employees and their moderators often aren't very good (or they are overworked) (IMG:http://www.highrankings.com/forum/style_emoticons/default/lol.gif) .

QUOTE
This technique combined with others like page rank should be able to balance each other out really nicely.


Isn't pagerank patented? Speaking of patents, does anyone here know how you can file patents and how much it would cost? I think that I might actually have a few patentable ideas...

QUOTE
Oh, another idea is simply using googles existing results and applying your own additional ideas to it. You could start with the Google API and if you start to come up with something that becomes usable, get a google license. (see a9.com by Amazon)


Well, that might be easier but I really want to develop my own engine and I really don't think that I could stand a chance competing with Amazon, Google, Yahoo, and the rest if I used Google's own technology. I think that being original is a plus in this field right now as the competition is so fierce.

QUOTE
Back to LAMP, you may want to write down in your plans to become an expert at compiling the 3 programs. There is a lot you can do to optimize compilation to create a faster more robust php. Also in your long term plans could be to make your own PHP distribution where you gut out all the stuff that just isn't needed for the search engine. (it has tons of functions you will never use...some you can control through compile options, others are built in)


Sounds good! I already have a basic knowledge of php and MySQL although I certainly could improve my knowledge of them and I just started an Apache book yesterday. I don't know anything about Linux but I guess that that isn't really all that crucial for my needs at this point.

Thanks again for all the great advice!
Go to the top of the page
 
+Quote Post
rohgan03
post Jun 24 2004, 06:32 PM
Post #12


HR 6
******

Group: Active Members
Posts: 944
Joined: 28-January 04
User's local time:
Feb 9 2010, 10:56 AM
Member No.: 2,123



QUOTE
have already been planning on letting ordinary users vote for the search results


Google measures click thru and time spent on a site. This is user vote without explicitly asking for it.
Go to the top of the page
 
+Quote Post
bauguss
post Jun 24 2004, 07:03 PM
Post #13


HR 2
**

Group: Active Members
Posts: 42
Joined: 11-November 03
User's local time:
Feb 9 2010, 10:56 AM
Member No.: 1,291



QUOTE
MySQL be able to handle the load or would another database work better?


Yes I believe it would. Not only that but there is a clusterable MySQL server available from mysql.com. Can you believe it is free too? I've never done it myself but I'm sure it works great. At some point obviously you would want to support that hand that feeds you. I recommend to anyone who uses MySQL and makes money to contribute something to them.

Also, Google uses MySQL to some degree. Not sure if it is some subset of their system or if it actually behind the search engine results. I do know that MySQL is extremely robust for not being some huge corporate conglomerate. It outperforms SQL Server and Oracle in several areas. And that is all with out of the box compile options. Throw in optimizations and I'm sure it is gets better and better.

Anyway, mysql.com is kinda like php.net. They have tons and tons of info available at their site.

As for PageRank, yeah I think it is patented. Hopefully the patent office is going to get turned around soon though. They have been issuing patents left and right with no regard to prior art. PageRank may sound like a great Google invention, however I would not be suprised if some academia folks had already come up with something similiar. (though maybe not internet related) Software patents really need to go. That said, you should be able to get one fairly easily and as long as it is so easy for one company to patent someone elses ideas (re: Amazon 1-click shopping and a few thousand others)

There may be many who disagree, but I actually think php could ultimately work. PC's are getting faster. Memory is getting larger. And since you would have to scale anyway, why not do it with something that already does the dirty work for you. And hey, you have the code so you could always get some phd's to work for you and rip out junk not needed.

And then there is always good design. You could always have it such that it caches search results and only has to compile them from mysql once a day. I do this for a large pull down menu on our site to keep it from asking mysql for the same thing over and over. MySQL also has built in caching that I think works pretty well. But think about caching search results to a static.html file.

CODE
 //pseudo code

 if(searchphrase_cache.txt exists)
   echo its contents
 else
   perform sql lookups and create the searchphrase_cache.txt


Obviously it is more complicated than that as there are lots of search phrase variations. You could always cache the more popular phrases and never cache the not so popular phrases. (ie you could have a searchcount variable attached next to each search phrase in your search terms db...then cache any that have a searchcount greater than x)
Go to the top of the page
 
+Quote Post
bauguss
post Jun 24 2004, 07:06 PM
Post #14


HR 2
**

Group: Active Members
Posts: 42
Joined: 11-November 03
User's local time:
Feb 9 2010, 10:56 AM
Member No.: 1,291



QUOTE(rohgan03 @ Jun 24 2004, 07:32 PM)
Google measures click thru and time spent on a site. This is user vote without explicitly asking for it.

they must do it with the google toolbar then ??

i see a lot are doing that with toolbars now. The only downside is that you are depending on people actually installing it. And when Yahoo has one, and google has one, and alexa has one, and ........

All about statistics I guess. They obviously get a large enough sample size from those who installed it. And when they throw features like AutoFill and Popup blockers, all the more beneficial.

Oh that and they suckered webmasters into obsessing over pagerank (IMG:http://www.highrankings.com/forum/style_emoticons/default/smile.gif)
Go to the top of the page
 
+Quote Post
rohgan03
post Jun 24 2004, 07:11 PM
Post #15


HR 6
******

Group: Active Members
Posts: 944
Joined: 28-January 04
User's local time:
Feb 9 2010, 10:56 AM
Member No.: 2,123



No they dont do it with the toolbar. its not hard to figure out click thrus and how often and after how long the same user(session) does another search for the same/clcik on another site thru search resutls.
Go to the top of the page
 
+Quote Post

3 Pages V   1 2 3 >   
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



This forum is sponsored by High Rankings, a Boston SEO Agency
- Lo-Fi Version Time is now: 9th February 2010 - 09:56 AM