Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!



Photo

Robots.txt Shows Home Page!


  • Please log in to reply
9 replies to this topic

#1 Say Yebo

Say Yebo

    HR 4

  • Active Members
  • PipPipPipPip
  • 220 posts
  • Location:USA

Posted 02 August 2010 - 01:23 PM

I'm assessing a site for a potential new client and when I checked www.client.com/robots.txt to see their robots file, I got their home page.

I thought they somehow had their robots.txt file redirecting to their home page, but a quick header/redirect check shows it's a direct link. I get a 200 OK result.

I've never seen this before. What's it all about?

#2 Ron Carnell

Ron Carnell

    HR 6

  • Moderator
  • 966 posts
  • Location:Michigan USA

Posted 02 August 2010 - 02:30 PM

Try to look at www.client.com/nopage.here and I suspect you'll see exactly the same thing. If so, it's just a very poorly executed custom 404 "plan." (sic)

#3 Jill

Jill

    Recovering SEO

  • Admin
  • 32,916 posts

Posted 02 August 2010 - 03:57 PM

Yep, and that's very bad news as it means they could have tons of non existent URLs all being indexed with the home page content!

#4 Say Yebo

Say Yebo

    HR 4

  • Active Members
  • PipPipPipPip
  • 220 posts
  • Location:USA

Posted 03 August 2010 - 09:46 AM

Urgh. OK, I'll have to look into that then. Thanks guys!

QUOTE(Ron Carnell @ Aug 2 2010, 03:30 PM) View Post
Try to look at www.client.com/nopage.here and I suspect you'll see exactly the same thing. If so, it's just a very poorly executed custom 404 "plan." (sic)


Looks like you're spot on Ron! That IS the case.

QUOTE(Jill @ Aug 2 2010, 04:57 PM) View Post
Yep, and that's very bad news as it means they could have tons of non existent URLs all being indexed with the home page content!


And indeed they do. I just looked and found over 60 versions :-(

Many of the versions look like this...with the word token in it: client.org/index.cfm?CFID=31111&CFTOKEN=37878467

They have a CMS - might they be creating duplicates of their home page every time they edit it?

Is this even fixable?

Edited by Jill, 03 August 2010 - 08:45 PM.


#5 qwerty

qwerty

    HR 10

  • Moderator
  • 8,607 posts
  • Location:Somerville, MA

Posted 03 August 2010 - 09:59 AM

Sure. Set up the server to respond properly to requests for nonexistent URLs. The next time the crawler requests those URLs it will get a 404 instead of a 200.

#6 Say Yebo

Say Yebo

    HR 4

  • Active Members
  • PipPipPipPip
  • 220 posts
  • Location:USA

Posted 03 August 2010 - 10:19 AM

QUOTE(qwerty @ Aug 3 2010, 10:59 AM) View Post
Sure. Set up the server to respond properly to requests for nonexistent URLs. The next time the crawler requests those URLs it will get a 404 instead of a 200.


Thanks Qwerty....sounds simple enough.

I've just found another issue: Some of those duplicated page have a different domain name featuring the design company's name. My client is a .org, but about 20% of the duplicated home pages are on: client.designco.com/index.cfm....

Will the 404 take care of that too?

#7 Jill

Jill

    Recovering SEO

  • Admin
  • 32,916 posts

Posted 03 August 2010 - 08:46 PM

Those should be excluded via a robots.txt file if possible.

#8 Say Yebo

Say Yebo

    HR 4

  • Active Members
  • PipPipPipPip
  • 220 posts
  • Location:USA

Posted 04 August 2010 - 09:32 AM

QUOTE(Jill @ Aug 3 2010, 09:46 PM) View Post
Those should be excluded via a robots.txt file if possible.

Thank you Jill. Living and learning!

#9 qwerty

qwerty

    HR 10

  • Moderator
  • 8,607 posts
  • Location:Somerville, MA

Posted 04 August 2010 - 10:27 AM

I think I disagree, but it may just be that I'm not clear on the situation. If you've got URLs indexed and you don't want them indexed, then blocking them with the robots exclusion protocol won't help: if search engines are told they can't request pages that they already know exist (or believe exist), then disallowing those pages will keep search engines from requesting them again. If they don't request them, they're never going to find out that the pages don't exist.

I think you'd be better off either setting up 301s from those URLs to others, or just letting the search engines receive 404s when they request them. Those methods will eventually drop those URLs from the index.

The part I clearly don't get is that these are apparently pages from a different domain: designco.org vs. designco.com and subdomains thereof. Is G reporting that there are .com pages on the .org domain??

#10 Say Yebo

Say Yebo

    HR 4

  • Active Members
  • PipPipPipPip
  • 220 posts
  • Location:USA

Posted 11 August 2010 - 12:23 PM

QUOTE(qwerty @ Aug 4 2010, 11:27 AM) View Post
The part I clearly don't get is that these are apparently pages from a different domain: designco.org vs. designco.com and subdomains thereof. Is G reporting that there are .com pages on the .org domain??


Well, Google is finding their home page content on client.org and client.designco.com. When I pointed this out, their design company 301'd the client.designco.com to client.org. But I'm not sure why that situation existed in the first place.

As I am still waiting to see which SEO company gets awarded the job, I have not spent any more time on the subject, but I felt inclined to point out the problem as it may help us get the job. (Or maybe not!)




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

SPAM FREE FORUM!
 
If you are just registering to spam,
don't bother. You will be wasting your
time as your spam will never see the
light of day!