Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!


Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 




From the folks who brought you High Rankings!



  • Please log in to reply
4 replies to this topic

#1 mauro21pl


    HR 1

  • Members
  • Pip
  • 6 posts

Posted 19 May 2009 - 07:01 AM

i created a site a while ago. The google indexed about 140 pages out of that site, and actually I dont want most of the pages to be indexed anymore. I read an article that it is possible to remove pages form an google index by creating an appriopriate robots.txt. Is that true. How often is the file 'robots.txt' checked by google crawler?



    HR 4

  • Active Members
  • PipPipPipPip
  • 183 posts

Posted 19 May 2009 - 07:10 AM

well according to google it's for blocking indexing and removing your site.


#3 mauro21pl


    HR 1

  • Members
  • Pip
  • 6 posts

Posted 19 May 2009 - 08:33 AM

OK, so I want to remove some pages from my site. Looks like google crawled the site , I think. In a Google Webmaster tools > Tools > Analyze robots.txt I can see thart status is 200, whih is good, and las downloaded time is 19 hours ago. The content of the robots.txt file is excutly the same that I have in my file and it seem to be perfectly fine. When I test against my pages in the provided box in Google Webmaster tools > Tools > Analyze robots.txt it seems to be woking just like I would love to. However the pages are still there. How come? Is there any limit of time that I have to wait until the pages will be removed.

[Live url removed per [url=http://www.highrankings.com/forum/index.php?act=boardrules]Forum Rules[/url].]


#4 Randy


    Convert Me!

  • Moderator
  • 17,540 posts

Posted 19 May 2009 - 08:45 AM

1. The robots.txt is read each time the search engine spiders start a new run.

2. Since the pages are already indexed it'll take a bit of time to get them de-indexed after making your robots.txt changes. Give it a few weeks.

3. If you want to ban all compliant spiders from everything a simple rebots.txt that says

Disallow: /

will do the trick. That's not what you have in your robots.txt currently. Instead you're disallowing all bots from some subdirectories and disallows some bots from everything. With none of those latter category being a reference to Googlebot. So as you now have things set up Googlebot will only ignore the list of subdirectories in the first block of your robots.txt, since it applies to all spiders. That's not going to keep them away from all of your pages though.

#5 mauro21pl


    HR 1

  • Members
  • Pip
  • 6 posts

Posted 19 May 2009 - 08:58 AM

that is very correct. I want googkle to deindex everything that is in my folders on the site. The only page indexed should be the main one which happened to be index.html. So the GoogleBot , what you are saying, it will do actually what I want. That is great. So I will wait a few weeks and than see.

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

We are now a read-only forum.
No new posts or registrations allowed.