Are you a Google Analytics enthusiast?
Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE!

www.CustomReportSharing.com
From the folks who brought you High Rankings!
More SEO Content
Database Size Vs. Bleeding Page Rank
#1
Posted 08 December 2007 - 08:03 AM
I run a global directory that offers information to business travellers on a city level (e.g. restaurants in Barcelona).
-My site is in 8 languages and offers 4 categories per city (e.g. hotels, restaurants, car rental). About 100 cities are represented.
- At the moment about 4,000 suppliers (e.g. restaurants) are represented in the directory. They all have their own page within the directory, this page is available in 8 languages.
- Every city has a "overall" page (with all the 4 categories) and a page only displaying one specific category (e.g. restaurants in Barcelona). This means that there are 5 categories per city.
When you multiply this, the database exists out of 40,000 unique pages ((100 cities x 8 languages x 5 categories = 4,000) + (8 languages x 4,000 suppliers)).
To the best of my knowledge the number of pages of a domain is part of the Google algorithm. This would be an argument to offer all pages to Google for indexing. The downside is that the page rank of the (home)age(s) bleeds over the 40,000 pages. From a page rank point of view I think it is smart to link to the 100 "overall" city pages in the English language, since people who are searching do this in English and on a city level (e.g. "Barcelona restaurants). In that case I would protect the rest of the site for Google (e.g. via java scripts). This means that Google only indexes the 100 most relevant pages and devides page rank amongst them
My question: what is wise? Let Google index all pages or only offer the pages I really like Google to index and rank high?
#2
Posted 08 December 2007 - 08:40 AM
If you need those lower level supplier pages to appear in a SERP when someone searches for them, you'll definitely want them indexed. Personally I wouldn't wall those off because they're an excellent opportunity to target very specific, long tail search phrases.
At the end of the day you're really asking about how to sculpt your link popularity or PageRank. Goggle nowadays says tells you to use nofollow to do this. We've discussed this a few times 'round the forums and there are views on both sides. Personally, I'm not yet convinced it's the wisest thing to do. Especially considering all of the search engines seem to treat nofollow a little bit differently. So doing something for the way one engine treats it in order to gain an advantage could have unintended negative effects on another engine. It's just risky when there's no standard method of handling on the search engines side of thing.
Plus what happens if people start linking to your interior pages --which is a goal you want to shoot for-- but you've told the search engines that you don't consider those to be important pages of your site via nofollow? Nobody knows.
At the end of the day you're not going to be linking to all 40,000 interior pages from your home page anyway. So if you construct your internal navigation so that it's sensible for real users, you're also sculpting your internal link popularity/PR. Call me old fashioned, but I like to stick to the tried and tested methods rather than jumping on each new theory or bandwagon that comes along. If for no other reason than that brand new things typically change several times before they become tried and tested.
#3
Posted 08 December 2007 - 03:13 PM
Thank you for your reply.
If you need those lower level supplier pages to appear in a SERP when someone searches for them, you'll definitely want them indexed. Personally I wouldn't wall those off because they're an excellent opportunity to target very specific, long tail search phrases.
To be honest I'm mainly interested in a high rank for the 'city' pages. At the moment most city pages rank within the top 20 when you search Google with the relevant keywords. I rather get a higher ranking for them than rank for all possible 'long tail' searches. My question is really about quality vs quantity: will my 100 'city pages' rank higher when all 40,000 pages are included or will they rank higher when I restrict indexing of the site and link only to those 100 pages from the homepage and 'hide' the rest of the sites to SE, for example by using java commands so that my page rank is distributed only over relevant pages?
By the way: my biggest competitor who really has excellent (top 3) rankings for most of the searches (searches with more than 200,000 results), has only 712 pages indexed!
#4
Posted 09 December 2007 - 07:36 PM
actively 'optimising' for long tail keywords on 20 or so pages is good but its not really the same as what Randy is talking about (I presume). They would have to be short (statistically) tailed enough for you have a clue that they are even being searched for (ie be picked up by word tracker, keyword discovery, get enough adwords searches for you to pick up) the really long tail is terms that may only be searched once, ever. Or only bring you a handful of searches per month. Several sites I am involved with end up picking up almost half of their organic traffic from phrases with terms that only bring in under 5 visits per month and maybe 20% from terms that only bring 1 visitor. Thats the really long tail. And its not possible/practical to 'optimise' for these individually.
If you have 40,000 pages with unique content you have a chance at picking up a lot of these one off search terms. Particularly if the content is contributed by individual vendors.
In your shoes, I'd probably consider going down the conventional route of trying to get as many pages indexed as possible. Then I would try and determine how many searchers land on internal pages (meaning thay are getting SE traffic). Then you can make a more informed desicion about the potential costs/benefits of 'shaping' your PR by excluding internal pages.
One thing to consider is that with such a big site, unless it has a lot of link strength, getting SEs to take internal pages seriously is enough of a challenge. So, trying to keep them indexed but restricting their page rank with nofollows will probably not be possible.
#5
Posted 10 December 2007 - 09:37 AM
I'd move forward exactly at nethy laid out. Try to get as many pages as possible indexed and then spend some time tracking down which inner pages are getting SE traffic and try to optimize them for the longer tail terms that are going to bring you some traffic.
More qualified traffic is always better than trying to sculpt PR in my book. Because as mentioned above these longer tail terms typically make up a statistically significant portion of my traffic. No one phrase does, but when you add all of those together it's certainly worth the time investment. Especially when they're so darned easy to rank for!
I won't even mention that these more specific, longer tail terms tend to carry a several times higher conversion rate than my more generic phrases.
Edited by Randy, 10 December 2007 - 09:43 AM.
#6
Posted 10 December 2007 - 02:02 PM
My idea: put only links to the city pages on the homepage. Block the rest of the links on the homepage (contact info, FAQs) with java commands (onclick). This will block Google from indexing these pages. Do the same thing (onclicks) with links from the city pages (e.g. restaurants in Barcelona) to detail pages (e.g. a specific restaurant in Barcelona). In this structure Google cannot not index the detail pages, so no pagerank is being leaked to the detail pages.
To let Google index these detail pages, I'm thinking of setting up a sitemap with links to the detail pages. This allows Google to index these detail pages. I think this has another advantage: from the navigation bar in every detail page (e.g. home --> barcelona --> restaurants --> King Juan Carlos) you can create an internal link to the city page (in this example the word 'restaurants'). You can include a relevant alt tag to this link (in this case 'barcelona restaurants).
#7
Posted 10 December 2007 - 10:59 PM
Instead, create your site architecture in such a way that your important pages are where people can find them easily. Yeah I know...what a concept!
#8
Posted 11 December 2007 - 01:51 AM
Hi Jill
Thank you for your reply. From my point of view in the structure I come up with all pages are actually being indexed, while the page rank of the homepage is distributed over the most important pages (the city pages). By indexing the detail pages via the sitemap and internally link them in one way to the relevant city page I think all page rank goes to pages I really like to rank high in the searches.
From your response I understand you prefer to have a normal SE structure (home --> city page --> detail page). IMHO in that structure the city pages give away a part of their page rank to the detail page. Disclaimer: I learned about the concept of avoiding leaking page rank to less important pages only last weekend. I'm still digesting all the views on it.
#9
Posted 11 December 2007 - 05:18 AM
there is only one answer to it
It's bull
and will just distract you from doing something FAR more useful.
#10
Posted 11 December 2007 - 09:24 AM
#11
Posted 11 December 2007 - 11:50 AM
#12
Posted 18 May 2008 - 10:51 AM
why does your robots file look like this?
Disallow: /webtop.log
Disallow: /webtop/
Disallow: /stuff/contentmgr/templates/
Disallow: /forum/index.php?act
Disallow: /forum/index.php?showuser
Disallow: /forum/lofiversion/
Disallow: /pplmg
Disallow: /ngphp
Disallow: /dwgiwlzpn
Disallow: /atlbizchron
Disallow: /isalesatl
Disallow: /seminar/
Disallow: /nittyhra
Disallow: /register/
Disallow: /cc/
Disallow: /atlanta
Disallow: /may03-seo-seminar.htm?c1
Disallow: /gliadel/
Disallow: /currentstats/
Disallow: /msp/
Disallow: /webceo
Disallow: /newsletter/question2.php
Disallow: /forum/lofiversion/
Disallow: /march-training-class
Disallow: /april-seo-training
User-agent: Googlebot
Disallow: /webtop.log
Disallow: /webtop/
Disallow: /stuff/contentmgr/templates/
Disallow: /pplmg
Disallow: /ngphp
Disallow: /dwgiwlzpn
Disallow: /atlbizchron
Disallow: /isalesatl
Disallow: /forum/index.php?act
Disallow: /forum/index.php?showuser
Disallow: /seo-writing.htm?c1
Disallow: /advisor.htm?c1
Disallow: /seminar/
Disallow: /nittyhra
Disallow: /register/
Disallow: /cc/
Disallow: /atlanta
Disallow: /may03-seo-seminar.htm?c1
Disallow: /gliadel/
Disallow: /currentstats/
Disallow: /*getlastpost
Disallow: /*mode=linearplus
Disallow: /*mode=threaded
Disallow: /forum/lofiversion/
Disallow: /msp/
Disallow: /webceo/
Disallow: /newsletter/question2.php
Disallow: /march-training-class
Disallow: /april-seo-training
#13
Posted 18 May 2008 - 11:49 AM
Mostly to stop duplicate URLs from being indexed, as well as parts of my CMS that don't belong in the index.
#14
Posted 19 May 2008 - 12:03 AM
I've been trying for the past couple of days to figure out how to remove most of the string or mod rewrite it but haven't been successful yet. Should I no follow those links or use the robots text on them?
#15
Posted 19 May 2008 - 12:19 AM
If you want them to stand any chance of getting indexed you certainly don't want to start nofollowing or excluding them via robots.txt
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users









