Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!



Photo

Robots.txt


  • Please log in to reply
4 replies to this topic

#1 mauro21pl

mauro21pl

    HR 1

  • Members
  • Pip
  • 6 posts

Posted 19 May 2009 - 07:01 AM

hi
i created a site a while ago. The google indexed about 140 pages out of that site, and actually I dont want most of the pages to be indexed anymore. I read an article that it is possible to remove pages form an google index by creating an appriopriate robots.txt. Is that true. How often is the file 'robots.txt' checked by google crawler?
thanks

#2 NASA

NASA

    HR 4

  • Active Members
  • PipPipPipPip
  • 183 posts

Posted 19 May 2009 - 07:10 AM

well according to google it's for blocking indexing and removing your site.

QUOTE




#3 mauro21pl

mauro21pl

    HR 1

  • Members
  • Pip
  • 6 posts

Posted 19 May 2009 - 08:33 AM

OK, so I want to remove some pages from my site. Looks like google crawled the site , I think. In a Google Webmaster tools > Tools > Analyze robots.txt I can see thart status is 200, whih is good, and las downloaded time is 19 hours ago. The content of the robots.txt file is excutly the same that I have in my file and it seem to be perfectly fine. When I test against my pages in the provided box in Google Webmaster tools > Tools > Analyze robots.txt it seems to be woking just like I would love to. However the pages are still there. How come? Is there any limit of time that I have to wait until the pages will be removed.

[Live url removed per [url=http://www.highrankings.com/forum/index.php?act=boardrules]Forum Rules[/url].]

thanks

#4 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 19 May 2009 - 08:45 AM

1. The robots.txt is read each time the search engine spiders start a new run.

2. Since the pages are already indexed it'll take a bit of time to get them de-indexed after making your robots.txt changes. Give it a few weeks.

3. If you want to ban all compliant spiders from everything a simple rebots.txt that says

CODE
User-agent:*
Disallow: /


will do the trick. That's not what you have in your robots.txt currently. Instead you're disallowing all bots from some subdirectories and disallows some bots from everything. With none of those latter category being a reference to Googlebot. So as you now have things set up Googlebot will only ignore the list of subdirectories in the first block of your robots.txt, since it applies to all spiders. That's not going to keep them away from all of your pages though.

#5 mauro21pl

mauro21pl

    HR 1

  • Members
  • Pip
  • 6 posts

Posted 19 May 2009 - 08:58 AM

that is very correct. I want googkle to deindex everything that is in my folders on the site. The only page indexed should be the main one which happened to be index.html. So the GoogleBot , what you are saying, it will do actually what I want. That is great. So I will wait a few weeks and than see.
Thanks




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

SPAM FREE FORUM!
 
If you are just registering to spam,
don't bother. You will be wasting your
time as your spam will never see the
light of day!