Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!



Photo
- - - - -

Indexing Of Large Site


  • Please log in to reply
23 replies to this topic

#1 JakeG

JakeG

    HR 4

  • Active Members
  • PipPipPipPip
  • 212 posts
  • Location:Nottingham, UK

Posted 09 May 2009 - 05:13 AM

Hi,

I launched a large site (around 100,000 URLs) about two weeks ago. After about a week Google had indexed 50 URLs, but it hasn't really changed since then (only a couple more picked up). I was wondering how long it will take to get these URLs indexed, google has been back to the site a couple of times but has only added a URLs. Most of the URLs indexed are top-level URLs from the navigation, or one click down (as you would expect from a first crawl) but a couple of the new ones are added seem to be chosen at random from deep within the site. Is it normal for this type of thing happen early on before I get deep crawled?


#2 Rajesh

Rajesh

    HR 4

  • Active Members
  • PipPipPipPip
  • 236 posts
  • Location:USA

Posted 09 May 2009 - 06:42 AM

For fast indexing you have to submit XML Sitemap to Google and work on the off page in that you can submit urls to Search engines, directories, submit articles and other activities.

#3 Jill

Jill

    Recovering SEO

  • Admin
  • 33,244 posts

Posted 09 May 2009 - 08:13 AM

It's only been 2 weeks. What other forms of marketing are you doing to get the word out about the site?

Google doesn't just automatically index thousands of pages unless they believe they will be useful to people and aren't the same thing they already have indexed from other sites.

#4 JakeG

JakeG

    HR 4

  • Active Members
  • PipPipPipPip
  • 212 posts
  • Location:Nottingham, UK

Posted 09 May 2009 - 11:41 AM

We've been promoting the site through press releases and on social networks, that side of things is going well and we'll getting lots of referals, and people searching for our brand name. There are not that many decent links pointing to the site yet so maybe this is why we've not got too many pages indexed. Assuming that the pages we have are seen as useful (which I think they are since they have content that isn't anywhere else), when should i expect to see a deeper crawl occuring?

#5 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 09 May 2009 - 11:59 AM

QUOTE
There are not that many decent links pointing to the site yet so maybe this is why we've not got too many pages indexed.


The above would be a pretty safe assumption to make. As Jill said, the engines need a good reason to spider the site. The more pages, the more interest you need to generate for the spiders. As long as you have a sensible, crawlable navigation structure and don't throw any other technical challenges in their way, this usually comes down to external links pointing to your site. Ideally you'll want links pointing to the root domain, the 1st level pages, the 2nd level pages and so on. The more entry points you give them the better, especially for larger sites.

QUOTE
Assuming that the pages we have are seen as useful (which I think they are since they have content that isn't anywhere else), when should i expect to see a deeper crawl occuring?


It's one of those It Depends things. It depends upon how many different places out there are telling the spiders they need to sit up and take notice of your new site. As well as how many tell them to take note of something that is deeper in your structure to give them new entry points.

Best case scenario you're probably looking at a couple of months. If you don't gain many new links or if most of the links all point to the same root/1st level pages it'll take longer. Sometimes many months longer.

It sounds like you're on the right track by having unique, valuable content and putting in the leg work to get it noticed. I'd keep doing what you're doing and give it a couple of months to see where you stand then.

#6 Jill

Jill

    Recovering SEO

  • Admin
  • 33,244 posts

Posted 09 May 2009 - 11:59 AM

Deeper crawling will happen as you get more links pointing to the pages. Especially if you can get deep links.

Plus, it really will depend on your site architecture. Do you have main categories all listed in your main, spiderable navigation? And do you have subcats all visible from there?

If lots of your pages are only linked to via one link somewhere within the site, they will likely take awhile to be spidered and once they are, are unlikely to show up in the SERP for anything relevant.

#7 JakeG

JakeG

    HR 4

  • Active Members
  • PipPipPipPip
  • 212 posts
  • Location:Nottingham, UK

Posted 09 May 2009 - 12:32 PM

Thanks alot for the advice guys, I'll try and get some external links to deeper pages.

Regarding the site architecture I really don't think it could be much better, each main category is listed in the navigation and acts as a "mini site map" and with links to each sub-category, and we have a footer navigation that links into certain deeper areas. All product pages link back to the categories they are in as well. Internal navigation was something we put a lot of thought into. It's the sheer number of pages that I think is the problem, I've never launched a site this large so don't know what to expect. I have read some stuff along the lines of, "you need to have a PR x to have y pages indexed"... but I'm not sure what evidence that is based on!

Will take the advice here and just keep at it, and will put off worrying about it till a couple of months down the line.

#8 Michael Martinez

Michael Martinez

    HR 10

  • Active Members
  • PipPipPipPipPipPipPipPipPipPip
  • 5,325 posts
  • Location:Georgia

Posted 13 May 2009 - 06:59 PM

If you submitted XML sitemaps you should see SOME additional page crawl for a while, not just the same 50 URLs. I've gotten large sites indexed just by submitting XML sitemaps (not the fastest way in my mind, but it works over time).

If you have NOT submitted XML sitemaps, try doing that. It won't substitute for the inbound links you'll want but it will help create some search visibility for the site.

#9 JakeG

JakeG

    HR 4

  • Active Members
  • PipPipPipPip
  • 212 posts
  • Location:Nottingham, UK

Posted 21 May 2009 - 06:07 AM

OK thanks Michael, I'll give XML sitemaps a go.

The site was launched around three weeks ago and it seems Google is slowly starting to take notice. Only a couple of hundred pages indexed so far, but as of today when I search site:mydomain in Google I'm it now tells me there are around 1300 pages (e.g. says pages 1-10 of about 1300). However when I click to the last pages of the paging i can see there are still only around 200. Does anybody have any idea why there is such a big descripancy between the number of pages it tells me are indexed and the number actually there? I'm hoping the number of actual pages will catch up soon!

#10 Michael Martinez

Michael Martinez

    HR 10

  • Active Members
  • PipPipPipPipPipPipPipPipPipPip
  • 5,325 posts
  • Location:Georgia

Posted 21 May 2009 - 05:12 PM

About 1,000 computers collaborate to return your search results. The estimates Google provides may be indicative of a lot of overlapping data, or they may reflect pages whose URLs are known but which have not been crawled, or perhaps it's just bad math on Google's part. They generally advise people not to put too much stock into those estimates (although why they leave them there, I don't know).

Go with what you can find in the index.

200 pages after 3 weeks is about average, assuming you don't have a lot of powerful sites linking to your site. You just have to wait for the crawling process to work out. It's not like Google immediately deep-crawls every new site. It grabs a page, pulls the URLs from it, adds the URLs to one or more crawling queues, and then those URLs have to wait for a crawler program to grab them again.

Once in a while Google schedules a site for a deep-crawl, where it grabs as many pages as it can find immediately, but it would be a rare new site (in my opinion) that merited a deep-crawl.



#11 OptimalPages

OptimalPages

    HR 2

  • Members
  • PipPip
  • 36 posts
  • Location:Central Ohio

Posted 21 May 2009 - 06:06 PM

If I'm not mistaken, when submitting a sitemap.xml to Google, the maximum allowed URLs is 500. I think there is a request form you can submit for more though.

#12 BBCoach

BBCoach

    HR 5

  • Moderator
  • 402 posts

Posted 21 May 2009 - 07:57 PM

QUOTE
If I'm not mistaken, when submitting a sitemap.xml to Google, the maximum allowed URLs is 500
You are mistaken. Here's the quote from G.

QUOTE
Google WebMaster Help - You can provide multiple Sitemap files, but each Sitemap file that you provide must have no more than 50,000 URLs and must be no larger than 10MB (10,485,760) when uncompressed. These limits help to ensure that your web server does not get bogged down serving very large files.


QUOTE
Please note that the Sitemap Protocol supplements, but does not replace, the crawl-based mechanisms that search engines already use to discover URLs. By submitting a Sitemap (or Sitemaps) to a search engine, you will help that engine's crawlers to do a better job of crawling your site.
And therefore; why would you want to waste time doing this??? Build your site right in the first place!

#13 JakeG

JakeG

    HR 4

  • Active Members
  • PipPipPipPip
  • 212 posts
  • Location:Nottingham, UK

Posted 22 May 2009 - 10:30 AM

OK thanks for the info, I'll have to go with what's in the index. It's just the number of estimated pages keeps getting bigger and bigger and it is getting me excited smile.gif Now have quite a few decent links so hopefully will speed up.

#14 Michael Martinez

Michael Martinez

    HR 10

  • Active Members
  • PipPipPipPipPipPipPipPipPipPip
  • 5,325 posts
  • Location:Georgia

Posted 22 May 2009 - 07:10 PM

If you want to get a large site crawled quickly, submit a sitemap and ignore advice against doing so. It doesn't guarantee that you'll be crawled more quickly but it doesn't hurt anything.

And I have gotten several hundred large sites crawled and indexed by using XML sitemaps. They work perfectly fine. On average, they were crawled and indexed faster than the several hundred large sites for which I did NOT submit XML sitemaps.

The "500" limit that OptimalPages may be thinking of is most likely the 500 domains per Google Webmaster Tools account. I believe the "fix" for that was to either ask to have more capacity added to your account or to open a new account.

I don't much like packing hundreds of domains into a single account as the user interface becomes unwieldy. They implemented it in AJAX or something equally slow and inefficient and it takes forever to get to whatever you want.

Understand that crawling doesn't begin until the search engine knows about the URLs. The sooner you get those URLs into the search engine, the sooner it will start fetching those pages.

You could, if you have a blog, use your own blog to link to important deep pages within the site. If the blog is pinging and being indexed within a day of posting then you can look forward to seeing those links fetched pretty quickly. I would not put more than a couple dozen links in any blog post. Nor would I sit there and rattle off blog posts one after the other. But it's an option for anyone who has an existing blog (or a Blogger blog, for that matter) who wants to help a large Web site get indexed.

Exercise moderation, expect anything you do to take time, and look for data that will help you down the road as your pages get indexed. See which long tail queries provide the best traffic for you. See which pages provide the best user experience. See which of your pages are recrawled the most.

Analyze what you're doing but don't agonize over whether you're doing enough. Do what you feel confident in doing.

Even the most experienced SEOs know they have to be patient and let the process roll out.

It certainly didn't help that Google announced a recrawl of the Web last week -- or maybe that was great timing. You'll have to wait another couple of weeks to see if they pick up the pace for you.


#15 BBCoach

BBCoach

    HR 5

  • Moderator
  • 402 posts

Posted 24 May 2009 - 05:13 PM

QUOTE
If you want to get a large site crawled quickly, submit a sitemap and ignore advice against doing so. It doesn't guarantee that you'll be crawled more quickly but it doesn't hurt anything.
You can't have it both ways Michael. Which is it? Stand for something! Apparently you haven't tested it and that's why you have a confilicting opinion.

I'll say it straight up. Sitemaps will NOT get you crawled deep and fast as Michael's logic would lead you to believe! It's your site design that will determine the depth of a crawl and the external links to those deep pages. My testing and as a result, my opinion, is don't waste your time with a sitemap file. Better time would be to submit a product upload file to the varying SEs. Spitooh on sitemaps. Submit product feeds instead.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

We are now a read-only forum.
 
No new posts or registrations allowed.