hi
i created a site a while ago. The google indexed about 140 pages out of that site, and actually I dont want most of the pages to be indexed anymore. I read an article that it is possible to remove pages form an google index by creating an appriopriate robots.txt. Is that true. How often is the file 'robots.txt' checked by google crawler?
thanks
Are you a Google Analytics enthusiast?
Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE!

www.CustomReportSharing.com
From the folks who brought you High Rankings!
More SEO Content
International SEM | Social Media | Search Friendly Design | SEO | Paid Search / PPC | Seminars | Forum Threads | Q&A | Copywriting | Keyword Research | Web Analytics / Conversions | Blogging | Dynamic Sites | Linking | SEO Services | Site Architecture | Search Engine Spam | Wrap-ups | Business Issues | HRA Questions | Online Courses
Robots.txt
Started by
mauro21pl
, May 19 2009 07:01 AM
4 replies to this topic
#1
Posted 19 May 2009 - 07:01 AM
#2
Posted 19 May 2009 - 07:10 AM
well according to google it's for blocking indexing and removing your site.
QUOTE
#3
Posted 19 May 2009 - 08:33 AM
OK, so I want to remove some pages from my site. Looks like google crawled the site , I think. In a Google Webmaster tools > Tools > Analyze robots.txt I can see thart status is 200, whih is good, and las downloaded time is 19 hours ago. The content of the robots.txt file is excutly the same that I have in my file and it seem to be perfectly fine. When I test against my pages in the provided box in Google Webmaster tools > Tools > Analyze robots.txt it seems to be woking just like I would love to. However the pages are still there. How come? Is there any limit of time that I have to wait until the pages will be removed.
[Live url removed per [url=http://www.highrankings.com/forum/index.php?act=boardrules]Forum Rules[/url].]
thanks
[Live url removed per [url=http://www.highrankings.com/forum/index.php?act=boardrules]Forum Rules[/url].]
thanks
#4
Posted 19 May 2009 - 08:45 AM
1. The robots.txt is read each time the search engine spiders start a new run.
2. Since the pages are already indexed it'll take a bit of time to get them de-indexed after making your robots.txt changes. Give it a few weeks.
3. If you want to ban all compliant spiders from everything a simple rebots.txt that says
will do the trick. That's not what you have in your robots.txt currently. Instead you're disallowing all bots from some subdirectories and disallows some bots from everything. With none of those latter category being a reference to Googlebot. So as you now have things set up Googlebot will only ignore the list of subdirectories in the first block of your robots.txt, since it applies to all spiders. That's not going to keep them away from all of your pages though.
2. Since the pages are already indexed it'll take a bit of time to get them de-indexed after making your robots.txt changes. Give it a few weeks.
3. If you want to ban all compliant spiders from everything a simple rebots.txt that says
CODE
User-agent:*
Disallow: /
Disallow: /
will do the trick. That's not what you have in your robots.txt currently. Instead you're disallowing all bots from some subdirectories and disallows some bots from everything. With none of those latter category being a reference to Googlebot. So as you now have things set up Googlebot will only ignore the list of subdirectories in the first block of your robots.txt, since it applies to all spiders. That's not going to keep them away from all of your pages though.
#5
Posted 19 May 2009 - 08:58 AM
that is very correct. I want googkle to deindex everything that is in my folders on the site. The only page indexed should be the main one which happened to be index.html. So the GoogleBot , what you are saying, it will do actually what I want. That is great. So I will wait a few weeks and than see.
Thanks
Thanks
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users







