Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!



Photo

Web Site Index Status


  • Please log in to reply
9 replies to this topic

#1 EddyGonzalez

EddyGonzalez

    HR 3

  • Active Members
  • PipPipPip
  • 90 posts

Posted 17 September 2014 - 06:25 AM

In WMT, the Google Index - Index Status - Advanced section the amount of pages blocked by robots.txt has steadily increased since June http://screencast.com/t/TdfWThWh4qKU I have checked the robots.txt file and tested some urls against the file and it all seems to be fine. There is clearly something wrong especially as the increase in blocked pages seems to coincide with the launch of a new site.

 

Any help/advice would be appreciated.

 

cheers

Eddy

 



#2 qwerty

qwerty

    HR 10

  • Moderator
  • 8,695 posts
  • Location:Somerville, MA

Posted 17 September 2014 - 07:08 AM

It may be that the search engine is being made aware of more URLs (whether pages exist at those URLs or not) that are located in a directory that's disallowed. Do you see any patterns to the URLs that are being added to the list, like they're in the same directory, or maybe they're internal search result pages?



#3 EddyGonzalez

EddyGonzalez

    HR 3

  • Active Members
  • PipPipPip
  • 90 posts

Posted 17 September 2014 - 07:49 AM

Is there a way I can view which URLs are been blocked? I cant seem to find anywhere on WMT that shows me this info. The search results are all blocked but I cant see if pages from this search folder are the ones that are showing up as being blocked.

Would pages with meta nofollow, noindex appear on WMT?



#4 chrishirst

chrishirst

    A not so moderate moderator.

  • Moderator
  • 7,718 posts
  • Location:Blackpool UK

Posted 17 September 2014 - 09:18 AM

 

Would pages with meta nofollow, noindex appear on WMT?

 

Yes, if Google 'knows' about the URLs, because in a seemingly contradictory or paradoxical way, the pages have to be indexed to know they should not be indexed.



#5 qwerty

qwerty

    HR 10

  • Moderator
  • 8,695 posts
  • Location:Somerville, MA

Posted 17 September 2014 - 09:42 AM

Is there a way I can view which URLs are been blocked?

I don't know of any way to do that via WMT, but you could run a site: search and look for snippets that read, "A description for this result is not available because of this site's robots.txt." Not very convenient, but it should work, as it will list URLs Google's aware of but can't crawl.

 

If you have suspicions about particular directories, make that part of the search, e.g. [site:domain.com/search/]



#6 EddyGonzalez

EddyGonzalez

    HR 3

  • Active Members
  • PipPipPip
  • 90 posts

Posted 17 September 2014 - 02:29 PM

 

Yes, if Google 'knows' about the URLs, because in a seemingly contradictory or paradoxical way, the pages have to be indexed to know they should not be indexed.

 

A large amount of pages had the following code

<META NAME="ROBOTS" CONTENT="*">

Could this have affected the site by blocking all these pages?



#7 chrishirst

chrishirst

    A not so moderate moderator.

  • Moderator
  • 7,718 posts
  • Location:Blackpool UK

Posted 17 September 2014 - 03:39 PM

The * is NOT a valid directive so it will have been taken to indicate total exclusion  rather than the default of no exclusions.



#8 EddyGonzalez

EddyGonzalez

    HR 3

  • Active Members
  • PipPipPip
  • 90 posts

Posted 18 September 2014 - 07:26 AM

great thanks.

 

 

Yes, if Google 'knows' about the URLs, because in a seemingly contradictory or paradoxical way, the pages have to be indexed to know they should not be indexed.

 

So even if a page is being excluded from being indexed either by using the robots meta tag or a robots.txt file, the page may still show in the index due to links, sitemaps etc?



#9 qwerty

qwerty

    HR 10

  • Moderator
  • 8,695 posts
  • Location:Somerville, MA

Posted 18 September 2014 - 07:32 AM

Yes. They don't know the content on the page, but they're aware of the URL and links pointing to it.  But you should try to avoid that happening by making sure you don't include such pages in your sitemap. You can't control whether or not other sites link to those pages, of course.

 

And for the most part, it doesn't matter. If Google has reason to believe the page exists, but can't crawl it, you're not wasting any of your crawl quota on it, and it's not likely to be returned for anything but some very, very specific queries.



#10 chrishirst

chrishirst

    A not so moderate moderator.

  • Moderator
  • 7,718 posts
  • Location:Blackpool UK

Posted 18 September 2014 - 07:49 AM

http://webmaster-tal...tips/28-qt-pips






0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

We are now a read-only forum.
 
No new posts or registrations allowed.