See http://www.sitemaps....l#submit_robots
SEO Class in Chicago, IL
Learn How To Optimize Your Website on July 26, 2013
High Rankings is offering a 1-day customized SEO training class in Chicago. Class size is limited so please sign-up now if you want in!
Are you a Google Analytics enthusiast?
Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE!

www.CustomReportSharing.com
From the folks who brought you High Rankings!
More SEO Content
Sitemap Protocol Updated
#1
Posted 11 April 2007 - 12:50 PM
See http://www.sitemaps....l#submit_robots
#2
Posted 11 April 2007 - 03:15 PM
I was going to attend the sitemap session here at SES, but the room was packed tight by the time I got there.
#3
Posted 11 April 2007 - 04:52 PM
Up until now I have used a site map generator to create my .xml
One with Google specs as a Google-sitemap.xml
One to Yahoo specs as a urllist.txt
I have now generated a a non-Google specific generic .xml
Put it in the robot.txt
And resubmitted it to Google webmaster Tools also.
The webmaster tools rejected the generic .xml file outright.
Am I being too early an adopter here?
The two different submission approaches appear to be incompatible at this time for the same generic.xml file.
The generator
www.auditmypc.com/free-sitemap-generator.asp
Thoughts Ideas?
Stephen V
#4
Posted 11 April 2007 - 07:33 PM
#5
Posted 11 April 2007 - 11:15 PM
My Yahoo account has "processed" the Google version.
S
#6
Posted 16 April 2007 - 08:13 AM
is an important advancement.
I think that the Search-engines should go one step further
and allow the webmasters to include in the robots.txt
all the functions of the sitemap.
So that we will able to provide these 2 sets of info
in one file.
#7
Posted 16 April 2007 - 11:35 AM
#8
Posted 16 April 2007 - 12:06 PM
These 2 files have various formats.
But the Search-Engines' teams are very strong teams, and they can perform this change.
I think that the robots.txt should also be transformed to the XML format.
It will save much work for the bots. Now they must identify and read 2 files.
I'm not an American. But you can have more influence over this improvement.
#9
Posted 16 April 2007 - 01:53 PM
So what happens when I check the robots.txt file in Google Webmaster Tools?
I'm guessing it's just that this particular tool hasn't been updated to recognize the update to the protocol, but I'm going to see if I can get an answer from Google.
#10
Posted 16 April 2007 - 02:30 PM
Disallow:
to the robots.txt file? Maybe it's looking for a User-agent: bit to start things off before you hit the Sitemap: part?
Will be interesting to hear what Google has to say by way of explanation/workaround...
--Torka
#11
Posted 16 April 2007 - 02:38 PM
<added>Here's some official info from Vanessa Fox:
#12
Posted 17 April 2007 - 10:56 AM
These 2 files have various formats.
But the Search-Engines' teams are very strong teams, and they can perform this change.
I think that the robots.txt should also be transformed to the XML format.
It will save much work for the bots. Now they must identify and read 2 files.
I'm not an American. But you can have more influence over this improvement.
"Search-Engines' teams are very strong teams"
Yeh right why isnt the ssl bug fixed on the uk version of base then :-) look at the discuson on matts blog about how G handles robots.txt - i'me with the camp that says they parse incorectly in certain degenerate cases.
The robots.txt standard has been around for a number of years and trying to fit a totaly diferent protocol sitemaps into it is not a good idea - quite why it couldn't have been the sitemaps.xml files goes at the root of a site and leave it at that.
oh just noticed 100th post W00t!
#13
Posted 17 April 2007 - 11:57 AM
I agree with your general sentiments. I too don't get why the engines would want to use robots.txt to do this. It only muddies the water further IMO. But on the other hand I can see why they might not want to force everybody to use the same name for a sitemap.xml file. Not all software packages that are out there can be easily tweaked to use the same xml filename.
Easy solution. Instead of querying the site only for a robots.txt file, as they all already do every visit, also query for another plain text file called sitemap.txt file. Then anyone who wanted to provide a sitemap could put the location info in this separate file. Maybe even slap a second line in there to allow people to help them out by identifying the filetype.
Sitemap: <http://www.domain.com/sitemap.xml.gz>
#14
Posted 17 April 2007 - 01:56 PM
Autodiscovery without having to tell the spiders to autodiscover something seems a lot more like autodiscovery to me.
#15
Posted 17 April 2007 - 03:29 PM
The sitemap protocol could automatically tell the spider to look for sitemap.xml, sure. But if whatever tool that made it doesn't name it that exact name and it's a constantly updating page as so many are where the webmaster couldn't rename the page by hand, that particular site would be left high and dry.
So in my view with sitemap.txt they give full control to the webmaster, just like with robots.txt. They specify the sitemap filename, and can even include filetype info so that the spider knows what to do with it when it gets it.
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users










