Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!



Photo
- - - - -

Generating A Sitemap For Millions Of Pages


  • Please log in to reply
7 replies to this topic

#1 eteare

eteare

    HR 2

  • Active Members
  • PipPip
  • 47 posts
  • Location:south of Boston

Posted 01 September 2010 - 04:15 PM

So, your site has 50 MM pages (really, no kidding). But in Google's webmaster tools you can only submit Sitemaps with 50k URLs or fewer. You can make a Sitemap index that link to Sitemap indexes. But this quickly gets insane. Especially when you have millions of URLs. Any suggestions?

#2 Michael Martinez

Michael Martinez

    HR 10

  • Active Members
  • PipPipPipPipPipPipPipPipPipPip
  • 5,085 posts
  • Location:Georgia

Posted 02 September 2010 - 01:13 AM

QUOTE(eteare @ Sep 1 2010, 02:15 PM) View Post
So, your site has 50 MM pages (really, no kidding). But in Google's webmaster tools you can only submit Sitemaps with 50k URLs or fewer. You can make a Sitemap index that link to Sitemap indexes. But this quickly gets insane. Especially when you have millions of URLs. Any suggestions?


You seem to be suggesting that you have the capability of adding 50,000,000 URLs to a single file but that you cannot generate 1,000 files with 50,000 URLs in each. Is that correct?

ON EDIT: I'm not trying to be snide. I'm trying to determine if the tools/resources you have available have forced you into an all-or-nothing scenario with respect to dumping 50,000,000 URLs into a text file.

Edited by Michael Martinez, 02 September 2010 - 01:23 AM.


#3 Mooro

Mooro

    HR 4

  • Active Members
  • PipPipPipPip
  • 157 posts
  • Location:Loughborough, Leicestershire

Posted 02 September 2010 - 06:17 AM

I'd either follow the guidelines or not bother, I actually think not bothering is the best method, I came to that conclusion when making up over a gig of map content for a site with over five million urls, it doesn't make a squat of difference in my opinion and you'd be better off spending more time working on the internal navigation and discoverablity of your content.

I'm convinced on site discovery means more to the engines than you feeding them a site map, one is what they can find the other being what you want them to find....

The biggest benefit a selection of maps will bring has been mentioned here a number of times after Vanessa Fox mentioned it, categorise your site maps then you can see which sections of your site have indexation issues.

That's the best use for a map I know as I'm convinced they don't contribute to visitors or $$ one little bit.

#4 eteare

eteare

    HR 2

  • Active Members
  • PipPip
  • 47 posts
  • Location:south of Boston

Posted 02 September 2010 - 07:55 AM

QUOTE(Michael Martinez @ Sep 2 2010, 02:13 AM) View Post
You seem to be suggesting that you have the capability of adding 50,000,000 URLs to a single file but that you cannot generate 1,000 files with 50,000 URLs in each. Is that correct?

ON EDIT: I'm not trying to be snide. I'm trying to determine if the tools/resources you have available have forced you into an all-or-nothing scenario with respect to dumping 50,000,000 URLs into a text file.



Ok, I see your point and you have every right to be snide. I am the SEO person, not in the engineering dept where they build this ginormous directory, so give me a little bit of a break. Trying to do with without dev help, you know? smile.gif

I do, however, believe VERY strongly in having an xml Sitemap that can be submitted to various webmaster tools, so I am just looking for tools that have helped people out. That's all.

#5 Michael Martinez

Michael Martinez

    HR 10

  • Active Members
  • PipPipPipPipPipPipPipPipPipPip
  • 5,085 posts
  • Location:Georgia

Posted 02 September 2010 - 12:54 PM

QUOTE(eteare @ Sep 2 2010, 05:55 AM) View Post
Ok, I see your point and you have every right to be snide. I am the SEO person, not in the engineering dept where they build this ginormous directory, so give me a little bit of a break. Trying to do with without dev help, you know? smile.gif

I do, however, believe VERY strongly in having an xml Sitemap that can be submitted to various webmaster tools, so I am just looking for tools that have helped people out. That's all.


Really, I was NOT trying to be snide or rude. I was only asking what the scope/limit of your capabilities are.

Am I inferring correctly that you don't have the means to build a text file, and therefore you just want to know if something out there will crawl the site and build the sitemaps?

With 50,000,000 URLs I would strongly advise AGAINST using an outside crawler. It might be more productive to work with the engineering team to develop a redundant site navigation to assist with crawling and link-based prioritizing of important hub pages.

#6 Mooro

Mooro

    HR 4

  • Active Members
  • PipPipPipPip
  • 157 posts
  • Location:Loughborough, Leicestershire

Posted 03 September 2010 - 05:33 AM

QUOTE(Michael Martinez @ Sep 2 2010, 06:54 PM) View Post
It might be more productive to work with the engineering team to develop a redundant site navigation to assist with crawling and link-based prioritizing of important hub pages.


Wow, there's a strange distrotion to the echo in here!

wink1.gif

QUOTE(Mooro @ Sep 2 2010, 12:17 PM) View Post
you'd be better off spending more time working on the internal navigation and discoverablity of your content.


I feel that if you have to rely on site maps for crawlers to discover your content your navigation and architecture are letting you down.

@eteare - how come you feel so strongly that they bring value? What value and how is it quantifiable?

#7 Michael Martinez

Michael Martinez

    HR 10

  • Active Members
  • PipPipPipPipPipPipPipPipPipPip
  • 5,085 posts
  • Location:Georgia

Posted 03 September 2010 - 10:07 AM

QUOTE(Mooro @ Sep 3 2010, 03:33 AM) View Post
Wow, there's a strange distrotion to the echo in here!


Not when people look at the original, unadulterated comment that talks about NOT using external crawlers.


#8 eteare

eteare

    HR 2

  • Active Members
  • PipPip
  • 47 posts
  • Location:south of Boston

Posted 03 September 2010 - 10:12 AM

In my experience, giving WMTs the URLs does get them indexed more readily. But even better, Google will tell you how many of the pages in your Sitemap have been indexed, which is a nice stat to watch over time.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

SPAM FREE FORUM!
 
If you are just registering to spam,
don't bother. You will be wasting your
time as your spam will never see the
light of day!