Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



SEO Class in Chicago, IL

Learn How To Optimize Your Website on July 26, 2013


Looking for personalized in-depth SEO training among your peers?



High Rankings is offering a 1-day customized SEO training class in Chicago. Class size is limited so please sign-up now if you want in!



 


Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!



Photo
- - - - -

Sitemap Protocol Updated


  • Please log in to reply
31 replies to this topic

#1 qwerty

qwerty

    HR 10

  • Moderator
  • 8,296 posts
  • Location:Somerville, MA

Posted 11 April 2007 - 12:50 PM

You can now point spiders to your sitemap from your robots.txt. Just add
CODE
Sitemap: <http://www.domain.com/sitemap.xml.gz>
(or whatever the URL of your sitemap is) anywhere in the file.

See http://www.sitemaps....l#submit_robots

#2 Jill

Jill

    High Rankings Advisor

  • Admin
  • 32,326 posts

Posted 11 April 2007 - 03:15 PM

Cool! Thanks for the info, Bob!

I was going to attend the sitemap session here at SES, but the room was packed tight by the time I got there.

#3 harpsound

harpsound

    HR 4

  • Active Members
  • PipPipPipPip
  • 222 posts
  • Location:Victoria BC Canada

Posted 11 April 2007 - 04:52 PM

Okay
Up until now I have used a site map generator to create my .xml
One with Google specs as a Google-sitemap.xml
One to Yahoo specs as a urllist.txt

I have now generated a a non-Google specific generic .xml
Put it in the robot.txt
And resubmitted it to Google webmaster Tools also.
The webmaster tools rejected the generic .xml file outright.

Am I being too early an adopter here?
The two different submission approaches appear to be incompatible at this time for the same generic.xml file.

The generator
www.auditmypc.com/free-sitemap-generator.asp

Thoughts Ideas?

Stephen V

#4 qwerty

qwerty

    HR 10

  • Moderator
  • 8,296 posts
  • Location:Somerville, MA

Posted 11 April 2007 - 07:33 PM

The version that Google has accepted in the past should work for Yahoo, MSN and (now) Ask as well. I don't know the generator you're using, but if it's ok with G, it should be ok.

#5 harpsound

harpsound

    HR 4

  • Active Members
  • PipPipPipPip
  • 222 posts
  • Location:Victoria BC Canada

Posted 11 April 2007 - 11:15 PM

That appears to be the case.
My Yahoo account has "processed" the Google version.

S

#6 Shlomo

Shlomo

    HR 2

  • Members
  • PipPip
  • 12 posts

Posted 16 April 2007 - 08:13 AM

The possibility to define the location of sitemap.xml in robots.txt
is an important advancement.
I think that the Search-engines should go one step further
and allow the webmasters to include in the robots.txt
all the functions of the sitemap.
So that we will able to provide these 2 sets of info
in one file.

#7 qwerty

qwerty

    HR 10

  • Moderator
  • 8,296 posts
  • Location:Somerville, MA

Posted 16 April 2007 - 11:35 AM

That would be nice, but I doubt it would be possible. The sitemap protocol calls for an XML file, and robots.txt has a very different structure from that. If the robots exclusion protocol were updated to allow for XML, then maybe the two could be combined.

#8 Shlomo

Shlomo

    HR 2

  • Members
  • PipPip
  • 12 posts

Posted 16 April 2007 - 12:06 PM

Of course you are right.
These 2 files have various formats.
But the Search-Engines' teams are very strong teams, and they can perform this change.
I think that the robots.txt should also be transformed to the XML format.
It will save much work for the bots. Now they must identify and read 2 files.

I'm not an American. But you can have more influence over this improvement.


#9 qwerty

qwerty

    HR 10

  • Moderator
  • 8,296 posts
  • Location:Somerville, MA

Posted 16 April 2007 - 01:53 PM

Here's a tiny complication. When the announcement about adding the URL of your sitemap to the robots.txt file, I set one up for a client of mine whose robots.txt had been empty, so now its only content is
CODE
Sitemap: <http://www.domain.com/sitemap.xml.gz>

So what happens when I check the robots.txt file in Google Webmaster Tools?
QUOTE
robots.txt file does not appear to be valid

I'm guessing it's just that this particular tool hasn't been updated to recognize the update to the protocol, but I'm going to see if I can get an answer from Google.

#10 torka

torka

    Vintage Babe

  • Moderator
  • 4,392 posts
  • Location:Triangle area, NC, USA, Earth (usually)

Posted 16 April 2007 - 02:30 PM

Wonder what it would say if you added
CODE
User-agent: *
Disallow:

to the robots.txt file? Maybe it's looking for a User-agent: bit to start things off before you hit the Sitemap: part? dntknw.gif

Will be interesting to hear what Google has to say by way of explanation/workaround...

--Torka mf_prop.gif

#11 qwerty

qwerty

    HR 10

  • Moderator
  • 8,296 posts
  • Location:Somerville, MA

Posted 16 April 2007 - 02:38 PM

I got a response, but not from a Googler. The person who responded is very active in the forum there, and s/he says that I was right in thinking it's just a matter of the webmaster tool needing to be updated.

<added>Here's some official info from Vanessa Fox:
QUOTE
We're adding support for the new instruction in our robots.txt analysis tool. The tool should be updated soon!


#12 Nueromancer

Nueromancer

    HR 5

  • Active Members
  • PipPipPipPipPip
  • 301 posts
  • Location:Bedford Uk

Posted 17 April 2007 - 10:56 AM

QUOTE(Shlomo @ Apr 16 2007, 06:06 PM) View Post
Of course you are right.
These 2 files have various formats.
But the Search-Engines' teams are very strong teams, and they can perform this change.
I think that the robots.txt should also be transformed to the XML format.
It will save much work for the bots. Now they must identify and read 2 files.

I'm not an American. But you can have more influence over this improvement.


"Search-Engines' teams are very strong teams"

Yeh right why isnt the ssl bug fixed on the uk version of base then :-) look at the discuson on matts blog about how G handles robots.txt - i'me with the camp that says they parse incorectly in certain degenerate cases.

The robots.txt standard has been around for a number of years and trying to fit a totaly diferent protocol sitemaps into it is not a good idea - quite why it couldn't have been the sitemaps.xml files goes at the root of a site and leave it at that.

oh just noticed 100th post W00t!

#13 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 17 April 2007 - 11:57 AM

WOOt indeed ! Congrats on 100. appl.gif

I agree with your general sentiments. I too don't get why the engines would want to use robots.txt to do this. It only muddies the water further IMO. But on the other hand I can see why they might not want to force everybody to use the same name for a sitemap.xml file. Not all software packages that are out there can be easily tweaked to use the same xml filename.

Easy solution. Instead of querying the site only for a robots.txt file, as they all already do every visit, also query for another plain text file called sitemap.txt file. Then anyone who wanted to provide a sitemap could put the location info in this separate file. Maybe even slap a second line in there to allow people to help them out by identifying the filetype.
CODE
Filetye: xml-gzip
Sitemap: <http://www.domain.com/sitemap.xml.gz>


#14 qwerty

qwerty

    HR 10

  • Moderator
  • 8,296 posts
  • Location:Somerville, MA

Posted 17 April 2007 - 01:56 PM

Gah. You want them to change the protocol so that everyone creates a new file that will always be in the same place so the bot can find it easily, and that file will serve the purpose of indicating the location of another file that will always be in the same place? Why not skip part one, and even skip pointing to it from robots.txt, and just have us all call our sitemap file either domain.com/sitemap.xml or domain.com/sitemap.xml.gz, and then set the bots to always check for it at those two URLs, the same way they always look for robots.txt at domain.com/robots.txt?

Autodiscovery without having to tell the spiders to autodiscover something seems a lot more like autodiscovery to me.

#15 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 17 April 2007 - 03:29 PM

ROFL A new Sitemaps protocol rather than having to fudge around with robots.txt Bob, that's all I'm saying. robots.txt already has enough issues with different engines supporting different non-standard features.

The sitemap protocol could automatically tell the spider to look for sitemap.xml, sure. But if whatever tool that made it doesn't name it that exact name and it's a constantly updating page as so many are where the webmaster couldn't rename the page by hand, that particular site would be left high and dry.

So in my view with sitemap.txt they give full control to the webmaster, just like with robots.txt. They specify the sitemap filename, and can even include filetype info so that the spider knows what to do with it when it gets it.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users