Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!


Sponsored Content

 

 
 

Photo
- - - - -

Sitemap Protocol Updated


  • Please log in to reply
31 replies to this topic

#16 semmaster

semmaster

    HR 1

  • Members
  • Pip
  • 1 posts

Posted 26 April 2007 - 08:39 AM

I just tested my Robot.txt file and I think it was allowed with no problems, but I'm not entirely sure what the results mean. Can any one shed some light on this for me? Here are the results.


CODE
User-agent: *
Disallow:

Sitemap: http://www.domainname.com/sitemap.xml



URL
Googlebot http://www.domainname.com/
Allowed by line 2: Disallow:
Detected as a directory; specific files may have different restrictions


#17 noel_x99

noel_x99

    HR 3

  • Active Members
  • PipPipPip
  • 104 posts
  • Location:PA

Posted 26 April 2007 - 08:47 AM

Thanks for the info. I've been using sitemap.xml on a number of sites now. I didn't know to put it in the robots.txt file.

Any idea what the gz for in the file name: sitemap.xml.gz? Just curious.


#18 qwerty

qwerty

    HR 10

  • Moderator
  • 8,288 posts
  • Location:Somerville, MA

Posted 26 April 2007 - 09:31 AM

QUOTE(semmaster @ Apr 26 2007, 09:39 AM) View Post
I just tested my Robot.txt file and I think it was allowed with no problems, but I'm not entirely sure what the results mean. Can any one shed some light on this for me? Here are the results.

User-agent: *
Disallow:

Sitemap: h ttp://www.domainname.com/sitemap.xml
URL
Googlebot h ttp://www.domainname.com/
Allowed
by line 2: Disallow:
Detected as a directory; specific files may have different restrictions

Your first two lines are fine. However, I believe you need to put the URL of the sitemap in <> brackets.

QUOTE(noel_x99 @ Apr 26 2007, 09:47 AM) View Post
Thanks for the info. I've been using sitemap.xml on a number of sites now. I didn't know to put it in the robots.txt file.

Any idea what the gz for in the file name: sitemap.xml.gz? Just curious.

It's a zip compression format.

#19 NECOWebGuy

NECOWebGuy

    HR 1

  • Members
  • Pip
  • 4 posts

Posted 26 April 2007 - 09:58 AM

It could be the format of the file itself. I'm nto familiar with the new protocol, but I do know that something ending with ".gz" is a file that's been compressed with the GnuZip utility. You may need to decompress it first to the .xml version. I'd bet that Google either can't or won't read GZ compressed files. Unless you have direct server access, you're probably going to decompress the file lcoally and upload it as sitemap.xml, then change the file name in robots.txt.

More information about .gz files is at: www.ncbi.nlm.nih.gov/Ftp/uncompress.html#gz
John

QUOTE(qwerty @ Apr 16 2007, 03:53 PM) View Post
Here's a tiny complication. When the announcement about adding the URL of your sitemap to the robots.txt file, I set one up for a client of mine whose robots.txt had been empty, so now its only content is
CODE
Sitemap: <http://www.domain.com/sitemap.xml.gz>

So what happens when I check the robots.txt file in Google Webmaster Tools?
I'm guessing it's just that this particular tool hasn't been updated to recognize the update to the protocol, but I'm going to see if I can get an answer from Google.


#20 qwerty

qwerty

    HR 10

  • Moderator
  • 8,288 posts
  • Location:Somerville, MA

Posted 26 April 2007 - 10:16 AM

Nope, it's a non-issue. Have a look at my next post in the thread.

#21 Archie

Archie

    HR 2

  • Members
  • PipPip
  • 14 posts

Posted 27 April 2007 - 04:11 AM

QUOTE(qwerty @ Apr 26 2007, 03:31 PM) View Post
Your first two lines are fine. However, I believe you need to put the URL of the sitemap in <> brackets.
It's a zip compression format.


QWERTY

I think you've got this wrong. Sitemaps.org states "The <sitemap_location> should be the complete URL to the Sitemap, such as: http://www.example.c...om/sitemap.xml" and in their other methods of submitting sitemaps to search engines, the <sitemap_location> in the examples shown is replaced (including the <> brackets) with the URL. A quick search on google seems to confirm this.



#22 identity

identity

    HR 2

  • Active Members
  • PipPip
  • 43 posts

Posted 27 April 2007 - 08:16 AM

How about having them look for sitemap in the root folder, with no extension, and that file can....

- contain the sitemap in xml format
- contain the sitemap in a gz format (or is the extension required for this?)
- contain the sitemap in a text format
- or contain a URL to the location of the sitemap

Problem solved?

#23 Archie

Archie

    HR 2

  • Members
  • PipPip
  • 14 posts

Posted 30 April 2007 - 05:33 AM

QUOTE(identity @ Apr 27 2007, 02:16 PM) View Post
How about having them look for sitemap in the root folder, with no extension, and that file can....

- contain the sitemap in xml format
- contain the sitemap in a gz format (or is the extension required for this?)
- contain the sitemap in a text format
- or contain a URL to the location of the sitemap

Problem solved?


Identity - that won't help with the "The Sitemap protocol" which is what this thread is discussing. The sitemap protocol requires the sitemap to be in XML and its location must be given in robots.txt file. See www.sitemaps.org

#24 identity

identity

    HR 2

  • Active Members
  • PipPip
  • 43 posts

Posted 30 April 2007 - 08:45 AM

Archie, I beg to differ.

What I'm recommending is completely in line with the sitemap protocol. All I'm recommending is that they default to a file named sitemap without an extension in the root folder which eliminates the need to specify whether it is a regular xml or a gzipped version (again, assuming that the gz extension isn't required). As for having a text version, I know that Yahoo allowed for that and seems like a nice, simple addition to the protocol... but mainly I was expressing how not needing a file extension would simplify everything.

And rather than using the robots.txt file to specify a name and file location, that this sitemap could do that instead in the instance of wanting to call it something else or place it somewhere else.

As for placing information in robots.txt, I don't believe there is anything that requires this, it has just been made available if you wish to do that. You can still submit or ping the engines with your sitemap without using robots.txt.

#25 escocia1

escocia1

    HR 2

  • Active Members
  • PipPip
  • 30 posts

Posted 30 April 2007 - 12:05 PM

after reading thru sitempas.org
i'm still left wondering what to put in my robots.txt file
apart from the obvious:

CODE
Sitemap: http://www.domain.com/sitemap.xml

is there anything else i should have in there?

also wondering if the following code is what should appear in sitemap.xml

<?xml version="1.0" encoding="UTF-8"?>
<urlset
xmlns="http://www.google.com/schemas/sitemap/0.84"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84
http://www.google.com/schemas/sitemap/0.84...p.xsd">
<url>
<loc>http://domain.com/</loc>
<priority>1</priority>
</url>
</urlset>


#26 Archie

Archie

    HR 2

  • Members
  • PipPip
  • 14 posts

Posted 01 May 2007 - 02:59 AM

QUOTE(identity @ Apr 30 2007, 02:45 PM) View Post
Archie, I beg to differ.

What I'm recommending is completely in line with the sitemap protocol. All I'm recommending is that they default to a file named sitemap without an extension in the root folder which eliminates the need to specify whether it is a regular xml or a gzipped version (again, assuming that the gz extension isn't required). As for having a text version, I know that Yahoo allowed for that and seems like a nice, simple addition to the protocol... but mainly I was expressing how not needing a file extension would simplify everything.

And rather than using the robots.txt file to specify a name and file location, that this sitemap could do that instead in the instance of wanting to call it something else or place it somewhere else.

As for placing information in robots.txt, I don't believe there is anything that requires this, it has just been made available if you wish to do that. You can still submit or ping the engines with your sitemap without using robots.txt.


My apologies Identity!
I misread the tack you were taking. In my haste I had not realised you were offering a more comprehensive solution for the protocol but rather that you had misunderstood the format currently required by it.

#27 identity

identity

    HR 2

  • Active Members
  • PipPip
  • 43 posts

Posted 01 May 2007 - 07:53 AM

QUOTE(Archie @ May 1 2007, 02:59 AM) View Post
My apologies Identity!


No worries, I was thinking that might be the case, but wanted to make sure that I hadn't completely gotten something wrong.

escocia1, I think you are good. The robots.txt is extremely basic, just the one line in addition to whatever else is in your file. And I think your format looks good.

#28 ranch

ranch

    HR 2

  • Active Members
  • PipPip
  • 47 posts
  • Location:Wonderful Copenhagen

Posted 30 July 2007 - 06:49 AM

The reason I have a link to my sitemap from my index page is to make sure that all pages are only one click away. Doing so should ensure frequent spidering + that all of the subpages inherit some of the PageRank from my index page, even though not all subpages may be reached in one click through the normal navigation.

If I create a sitemap.xml file it would seem superfluous to keep the sitemap.asp I have now, but if I delete it then my subpages wouldn't inherit the PR from the index page any more and thus loose rankings.

One the other hand if I keep my sitemap.asp I would have to maintain two versions of the same data. Not nice either.

Right or wrong?

#29 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 30 July 2007 - 07:41 AM

You're really talking about two completely separate things Henrik.

Your sitemap.asp file is there for Humans to use, with the added advantage that the spiders can also use it. So a double beneift.

The xml sitemap would only be used by the search engines, and is only useful if you have pages of your site that are not being spidered for one reason or another. Real people shouldn't be using it.

If your site has a solid internal navigation scheme and no techincal hurdles for the spiders to overcome, the xml sitemap is unnecessary duplication of work IMO. I don't use 'em on any of my sites because my sites are all built to be spider friendly. If I had a site that was not being spidered because of some technical hurdle I might create an xml sitemap, however it's just as likely that I'd simply fix the root cause of the spidering problem so that they can spider the site normally.

#30 ranch

ranch

    HR 2

  • Active Members
  • PipPip
  • 47 posts
  • Location:Wonderful Copenhagen

Posted 30 July 2007 - 07:47 AM

All my pages are spidered fine, so I guess I don't really need the new robots.txt feature!

Thanks, Randy!




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users