Are you a Google Analytics enthusiast?
More SEO Content
Google Broken Article
Posted 16 September 2004 - 02:35 AM
thanks for that clarification, here is a SERP of one lol well the site is badly designed so I am limited to what text is on the page until I sort it out for them. http://www.google.co.....e bristol uk"
<added> re Kackle.
inurl:"the" = 5.72 million
inanchor:"the" = 5.73 million
intext:"the" = 5.73 million
What exactly does this prove though?. Funny once again you bring up a conspiracy theory, accuse SEO's of 'covering up' there is no cover up, I really do not see what all this is about. On the one hand people are saying that Google is broken as it is full up, not adding any new pages, not indexing any pages, not returning any new pages in SERP's. I give an example of all this being done with cast iron proof (no way could the site have been in the Google index as the domain did not exist) Yet you are accusing me of covering up, and not understanding programming.
Damn right I do not understand the programming behind Google, and why should I? I understand what is needed to allow me to be able to do my job. Millions of people every day get into their car unaware of the workings of the internal combustion engine, why? Because they do not give a dman HOW it works, they know they have to check oil, water etc, and put petrol in in order to make it work. This does not stop them getting into the car and driving it though.
I really have not seen any major problems with google with regard adding sites etc, sure it has some sort of problem with regard partial indexed pages, and it's SERP's can stink sometimes. But as for what is being said, i disagree.
Many are theorising about what is wrong, but with the greatest of respect it is theory. I know it is based on fact with regard 4 bit etc, but unless you are in the plex, then your theorising. I am not having a go at you, and not covering up, I fail to see where you get covering up from, Dismissing somethingas ' not of iterest ' is hardly covering up.
Can you or someone else please state in black and white EXACTLY how this stops an SEO from doing their jobs when
new sites are being indexed
new sites are being added
new sites are showing in the serps
I am not after a food fight here, I honestly feel that t is a bun fight fr the sake of a bun fight. I do not understand programming to the level you speak of, but can assure you there is no cover up, as IMO there is NOTHING to cover up!
Posted 16 September 2004 - 06:16 AM
Let me be the second one to put my hand up and admit that, moreso than OWG, I am not a programmer - I simply don't have the brain for it - and I have a hard time wrapping my mind around the concepts that are being discussed in this thread (but that doesn't stop me from trying to!)
So this SEO (who, btw, comes from a marketing and copywriting background) is forced to take the non-techie view of what's happening, which is the way that all of my clients view it too, as it so happens. That view is that, by looking at Google's SERPS, it is evident that something's amiss and it's my job, as their SEO, to do all that I can to work around those problems.
Though I am sincerely grateful to the background info that both dmart and kackle have provided in this thread to explain why Google is broken, I have to say that I view it as 'interesting info' rather than 'critical info'. Why? Because there's nothing I can do to fix the core problems. And, even if my clients could understand the thinking behind the 4-byte vs 5-byte argument, their concern (and hence mine too) is: What do we do to work around it?
Thankfully, in some cases, the factual evidence points to the various theories being wrong. For instance, I don't need to turn away new clients who don't yet have a domain name registered because it seems that brand new sites are still being indexed, listed and ranked in the SERPS. But if the evidence changes on this, or any other of these issues, I will adjust my approach accordingly.
So yes, in terms of programming, I freely admit that I have no idea what I'm talking about, but I'm still proud to be called an SEO.
And no, I'm not covering anything up at all; I'm simply dealing with the consequences of Google's frailties, which is all that I can do really, and thankfully, it's all that my clients ask me to do.
Posted 16 September 2004 - 04:25 PM
And that about sums up the degree of my concern also lol
Posted 16 September 2004 - 07:09 PM
OK, let me get this straight cause I just don't understand anymore. The problem seems to go in about 50 different dirctions @ once, and I get a bit lost.
What you are saying Daniel, if I understand you correctly, is that Google can know about as many URLs as it likes (that is stored seperately in a whole nuther data file), but can only index content (the inverted index) for 2 to the power of 32 pages. Is that correct? So, Google may know that 10 billion URLs exist, but only actually keep content for 4 billion.
Now, that I don't get, but I am willing to accept is true, for the sake of moving the argument forward. Along those lines, lets assume everything you say is 100% correct. Google uses 4 bytes for docIDs and this maxes the "DB" out @ 4 bil some pages, and this problem existed, at the very least in the past, at the same time that the total pages indexed reached or exceeded the maximum allowable.
You seem to have a really good grasp of the problem, Daniel, which is indicated by your claim that "This is not a trivial issue."
So, given your understanding that this isn't a trivial problem, can I ask: what's the solution? If I put you in charge of fixing this problem, what would you do, how complicated would it be and how long would it take to "fix"? No problem is unfixable, and the solution may give the problem context.
I would be very interested to see you write about a solution, and such an understanding would help me to understand why this isn't trivial, and fill in the blanks between problem A exists and Solution of A is very difficult. Apologise if you have already explained this, and I am more than willing to read previous writing explaining why a solution is, from your understanding, so difficult.
I thought a lot about this last night (in an attempt to think of evidence that is compelling), and AFAIK (I have a degree in IT that was based around C/C++ on Unix), the solution need not be massively complicated. As you have a better grasp of the nuances of search and this specific problem, maybe you can show me how the solution is vastly more complicated than I grasp.
If you could enlighten me, much appreciated.
Posted 17 September 2004 - 07:22 AM
"Err! Guys we are running out of IDs in the main index"
"No problem, we'll add another index"
"What are we going to call it ?"
Oh I dunno, how about Supplemental?"
extra drive space for the index problem.
"Hello Seagate, Google Engineering here. We need a few more hard drives"
"Right, how many?"
"No Problem. They'll be about £12 each, next week Ok? "
And the "huge" problem of partial indexing appears to be nothing more than duplicate content pages. That is of course assuming I'm allowed to believe my own eyes and websites.
Posted 17 September 2004 - 07:00 PM
As many have said, Google has clearly been indexing new pages so why even go on the crusade to "prove" that Google's broken. If you go out to your car in the morning and it starts the second you turn the key, do you go to work and tell everyone how you opened the hood and measured the decibel levels of the engine and tested the temperature of the oil and it's clearly broken? No, you do what I do and drive to work. Why even worry about how it's programmed if it is doing what it's supposed to?
Now, if you can show me that there are no new pages in Google for several months then I'd be interested in hearing why Google's broken - otherwise, it's not.
Posted 17 September 2004 - 09:47 PM
He also, in trying to show how massively large the index will need to be, speaks of the amazing possibility of 1 billion pages in 2000. It seems pretty likely that they weren't building google to handle a a hell of alot more than that.
With any luck, google is furiously working on getting their technology in order so they can go back to being a reference for ALL of the web, not just pages that are somehow deemed more important than others. (Which was their original goal.) The temporary tactics that they have put into place as band aids seem to be bursting.
I'm not an SEO but I'm surprised that you aren't more interested in this.
Posted 17 September 2004 - 11:04 PM
What crusade? It is the most logical conclusion based upon the facts available. It sounds more like a crusade from your end to prove that google hasnt changed.
You are basing your conclusions on an "about 5 billion" display. How does that in any way prove to you that they are indexing 5 billion pages? I mean logically can anyone really say that that is some kind of proof?
Just because you all think google is infallible doesnt mean they didnt plan the original design around 4 bytes and doesnt mean that they planned to scale beyond that 4 byte index when clearly their documents state that they expected to be able to handle 1 billion pages.
Where is there ANY evidence other than the meaningless "the" search? On the other hand there has been several peices of evidence to back up the 4 byte theory, from reliable sources - the designers of the code in question. Why would anyone stand firm on something that has no basis in fact and discount something that does have some basis in fact?
So again, I guess you are saying that all the Y2K trained programmers just werent working on all the apps in the country "the right way". Although you admit they were teaching the right way in universities for years. So they werent dumb they all just didnt do it right. But that would never happen again because, why again? We are better trained, smarter, more modular, more what exactly prevents us that didnt prevent them? And if you have ever worked on a large scale project you would know that what you are saying is wishful thinking, not fact.
The WMW thread has lots of people claiming exactly that, but it is pretty obvious that you and most of the posters here really are not interested in hearing anything about a possible problem until/unless it hits you in the face.
If you want to pretend that everything is the same as last year go ahead. All I am saying is that it is not a logical conclusion based upon fact, it is an emotional conclusion based upon marketing.
Posted 17 September 2004 - 11:13 PM
Posted 18 September 2004 - 01:27 AM
I think they could have made the switch to 5 bytes within a few months, assuming that they committed sufficient resources to the task.
Since it is now 17 months later, I have to assume that they decided instead to take a different approach. They were probably already beginning to realize that freshness was more important than an accurate PageRank. At the time of update Cassandra, the crawl happened once a month and the PageRank took several days to compute, and updating the data centers took a few more days. This amount of overhead didn't have much of a future. Sure, keeping PageRank pure was a nice thought, but how competitive could it be in coming years?
By Cassandra the freshbot was already doing fairly well. It was introduced in August 2001, so by April 2003, when Cassandra presumably overflowed the 4 bytes and Google had to revert to an earlier backup for the update, they had plenty of experience in integrating the freshbot data into the main index.
Also by April 2003, they were making money hand over fist on ads, and were looking ahead to an IPO.
I think at that point they decided to invest more resources into ads. For the main index I think they decided to crank up the freshbot, and keep PageRank patched together any way they could. They introduced a supplemental index, they started showing more URL-only listings, and PageRank started getting much looser than it had even been. It used to be that PageRank was "guessed" until the next crawl by subtracting one for each directory deep where a new page was found. But during the last half of 2003 and in 2004, it seems that except for very old, established pages, all PageRank is more or less a "guess." It also took only a few high PageRank pages in 2004 to bring up a new page to the same level minus one. It was never that easy in the old days.
Now they haven't updated PageRank in three months. I think at some point, probably around update Florida, Google discovered that an unpredictable main index was good for their ad revenue.
To make a long story short, the question initially was whether Google should commit the resources to change to 5 bytes. It seems they decided to use stop-gap measures instead, and commit their resources to ads.
By now the question is whether Google even wants a strong main index. I don't think they care anymore. I believe they're going for broke to keep their ad revenues high in the short term. They're not thinking about the long term, because in the short term they'll all be millionaires at the Googleplex anyway. Five more months and everyone is able to cash in their options. Many can do it before then, starting in two months. Why care what happens to Google after you strike it rich? The hype about the "don't be evil" mission of Google is probably an inside joke by now. They're too busy counting their stock options.
Another benefit of this approach is that it will undo a lot of the manipulation that has grown into an industry surrounding Google, at the same time that it forces a lot of marketers into paid ads. Sure, it's bad news for Google in the long run, but I don't think the long run is even on the table as a consideration these days. There's also a serious question of whether Google could counter the spammers merely with better algorithms. Maybe they simply decided that they have to starve them all first, step over the bodies, and start over again with different algorithms at some point down the road.
It stopped being a technical question within a few months after update Cassandra. It became a marketing issue with Florida. By now it's a matter of whether they can keep most of the people fooled most of the time, as the main index deteriorates. The odds are in their favor.
Posted 18 September 2004 - 09:38 AM
I actually agree with you there.
I do think they would prefer to keep commercial stuff out of the main index anyway they can, and force people to buy ads.
They would like the left half to be info sites, and the right half to be commercial sites (which have to pay for each click).
The question I've always had about that strategy is whether the users will buy it and get it. If they start to realize it and say, okay, I need to click on the ads if I want to find the products I'm looking for, then it could work. I don't think the avg. person is averse to ads as much as some of us might be who've been online a long time. They just want to find what they're looking for as best as they can.
So will it work? Will people be happy with that in the long run? They've got the advantage right now because Google is synonymous with search. I'd say they have at least a year or two to keep testing out the theory since there's not much better around.
And if it does work, the other engines will do the same thing. Which of course means that SEO will be all about providing informational sites to go into the free listings, and advertising your commercial site within it. Many have already turned to that strategy and it seems to work.
I don't know if doing all this has anything to do with G being full up though, but I do think it's the strategy they're after. It's a gamble, but like you said, if they're just gonna cash out anyway, it's a pretty good gamble on their part.
Posted 18 September 2004 - 11:25 AM
I also thought this might be indicated from the evidence we saw during Florida. The other possibility was the drive toward localization with certain terms, such as real estate and travel sites. Maybe both were happening.
But since then I've come to suspect that Google is more mayhem than method. I say this because my main site is a 126,000-page site that is purely information, nonprofit, and has zero commercial utility. Moreover, it's information indexed over the last 22 years from investigative books, 95 percent of which is not digitized anywhere on earth and probably never will be. It would be lost to history apart from my site.
It always seemed to me to be a no-brainer for any engine to index this site if they're at all interested in pure information. I naively wrote to Larry Page in November 2000, offering to send him a CD so he wouldn't have to struggle to crawl the site (back then it was all dynamic pages and it was a struggle for both of us, but for the last two years I've made available a set of all-static pages for crawlers). No reply from Mr. Page, of course.
For the four years that Google has been crawling this site, only half of my pages ever got indexed by Google. Today if you subtract the URL-only listings, it is less than half.
Curiously, a special crawl happened on my site over the Labor Day weekend. I've never seen anything like it in four years. They went after the deep name files only, got all of them, and didn't request any nonexistent files, essays, or sitemap pages. It was as if they took my CSV dump of the name files, re-sorted them by the number of characters in the directory/filename, and went from fewest characters to most. It all took a day and a half. About every 25 minutes a single IP address at Google would hit me for about two minutes, grabbing my little name files at a rate of up to 45 per second. If I didn't already know that Google is perfectly capable of this much hubris, I would have defined it as a denial-of-service attack and blocked them.
So far there is no evidence that my site is benefitting from this special crawl. It's quite possible that between the crawling and the indexing, it will all fall off the edge of the earth. It won't be the first time.
Yahoo finally has almost all of my pages, and has also passed them to MSN. I'm hopeful that MSN's new engine, which has been crawling heavily, will be nice to me also. I'll give it a few months, and if nothing improves with Google then I plan to disallow Google in my robots.txt for the first time ever. That's going to feel really, really good.
Posted 18 September 2004 - 11:42 AM
Currently, the predominant business model for commercial search engines is advertising. The goals of the advertising business model do not always correspond to providing quality search to users. For example, in our prototype search engine one of the top results for cellular phone is "The Effect of Cellular Phone Use Upon Driver Attention", a study which explains in great detail the distractions and risk associated with conversing on a cell phone while driving. This search result came up first because of its high importance as judged by the PageRank algorithm, an approximation of citation importance on the web [Page, 98]. It is clear that a search engine which was taking money for showing cellular phone ads would have difficulty justifying the page that our system returned to its paying advertisers. For this type of reason and historical experience with other media [Bagdikian 83], we expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of the consumers.
Since it is very difficult even for experts to evaluate search engines, search engine bias is particularly insidious. A good example was OpenText, which was reported to be selling companies the right to be listed at the top of the search results for particular queries [Marchiori 97]. This type of bias is much more insidious than advertising, because it is not clear who "deserves" to be there, and who is willing to pay money to be listed. This business model resulted in an uproar, and OpenText has ceased to be a viable search engine. But less blatant bias are likely to be tolerated by the market. For example, a search engine could add a small factor to search results from "friendly" companies, and subtract a factor from results from competitors. This type of bias is very difficult to detect but could still have a significant effect on the market. Furthermore, advertising income often provides an incentive to provide poor quality search results. For example, we noticed a major search engine would not return a large airline's homepage when the airline's name was given as a query. It so happened that the airline had placed an expensive ad, linked to the query that was its name. A better search engine would not have required this ad, and possibly resulted in the loss of the revenue from the airline to the search engine. In general, it could be argued from the consumer point of view that the better the search engine is, the fewer advertisements will be needed for the consumer to find what they want. This of course erodes the advertising supported business model of the existing search engines. However, there will always be money from advertisers who want a customer to switch products, or have something that is genuinely new. But we believe the issue of advertising causes enough mixed incentives that it is crucial to have a competitive search engine that is transparent and in the academic realm.
Posted 18 September 2004 - 12:45 PM
And really, I have to kind of agree. I don't like it, but I can see why they might attempt this (if they are).
Posted 18 September 2004 - 03:18 PM
It seems like right now the google ads are often more relevant than the natural results. Yet the google ads are determined solely by the webmaster while the natural results are determined by the complex algorithm and filters.
The two things that I thought were interesting that Sergey said about advertising:
1)" For example, a search engine could add a small factor to search results from "friendly" companies, and subtract a factor from results from competitors. This type of bias is very difficult to detect but could still have a significant effect on the market. "
2) "But we believe the issue of advertising causes enough mixed incentives that it is crucial to have a competitive search engine that is transparent and in the academic realm"
On another note --
I don't think I'd have a problem if google decided to go with Jill's left side of the page and right side of the page theory. But they have to do it fully. How do you deal with consultants who on the surface don't look like they aren't selling anything but in reality are selling their consulting services?
Plus it just opens the market up for someone else to come in and make a wholly integrated, relevant search engine. (Just as google did initially.) If google only started limiting things and filtering things because their technology is temporarily f-ed up than maybe we can look forward to them getting back to being the google we all knew and loved once they fix it. If on the other hand it was because it's part of the new business model than we'll have to wait and see how that plays out.
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users