Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!



Photo

Cache Of Https And Ip Of The Same Domain


  • Please log in to reply
6 replies to this topic

#1 kashyap_rajput

kashyap_rajput

    HR 4

  • Active Members
  • PipPipPipPip
  • 127 posts
  • Location:India

Posted 03 September 2009 - 05:33 AM

Hello

We are having an ecommerce site that has products of over 5k+, we have google sitemap and regularly use google webmaster tool to check if there are any issues, two points that i have found that needs explanation and how to get rid of them

1. When i check my site for cached pages in google, lots of pages with https (secure mode) cached, how to avoid them, is there anything i can do with htaccess or robots.txt.. generally we enable https mode after user puts items to cart page, not sure how google cached https pages.

2. when i check Links to your site from google webmaster tools, lets say check one page that has backlinks of 100+ clicking on that showing IP addresse of our own domain that shows backlinks for that page.. not sure how IP could got in external backlinks, except IP no other pages showing backlinking us however there are external sources that provides us backlinks, We have checked entire site to make sure we have not used IP anywhere in links to get it cached, Please explain me how can we get rid of it

That's all, Thank you for reading..

regards
Kashyap

#2 kashyap_rajput

kashyap_rajput

    HR 4

  • Active Members
  • PipPipPipPip
  • 127 posts
  • Location:India

Posted 03 September 2009 - 10:03 AM

nobody ??

#3 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 03 September 2009 - 11:03 AM

A bit impatient are we? giggle.gif

For removing https secure pages from the index, here's what I do. It's a two step process and this is for Unix/Linux servers with .htaccess and mod_rewrite available.

First create an alternative robots.txt file for ssl pages. I call mine simply robots_ssl.txt to keep things logical. This file contains only the following:

CODE
User-agent: *
Disallow: /


Upload that to the root level of your site, in the same place you probably already have your normal robots.txt file.

Then in .htaccess you'd put the following:

CODE
RewriteCond %{SERVER_PORT} ^443$
RewriteRule ^robots.txt$ robots_ssl.txt


What this does is detect requests for the robots.txt file made via the server's normal secure port of number 443. When it detects one of these requests, and they should only be coming from spiders, it'll automatically tell the spider to read the robots_ssl.txt file that tells them to leave all of the secure pages alone.

For the second question, it sounds like your server is configured to allow surfing via the IP number. I'll have to think on that one for a bit to see if there's a way to overcome it via .htaccess or other apache configuration. In essence the problem gets exacerbated if you're using referential links. If you were using links that had the fully qualified domain name (eg pointing to www.somesite.com/somefile.html rather than simply /somefile.html) the problem would not exist. Even if the spider landed on an IP number page they'd get correct links to follow.

Like I said, I'll need to think on that one for a bit. From the .htaccess perspective you'd need something like {THE_REQUEST}, but THE_REQUEST doesn't include hostname info, just the file path. And you wouldn't want to use the ip number in a {HTTP_HOST} because it would attempt to redirect all pages since that doesn't look at the actual request. My guess before thinking about it is you'll probably end up with a deeper level fix in the apache configuration for the site if you want/need to do a redirect. So it might be simpler to change the way you're linking to your own internal pages.

Moving this question to the robots.txt section of the forum since that's where part of the question belongs.

#4 kashyap_rajput

kashyap_rajput

    HR 4

  • Active Members
  • PipPipPipPip
  • 127 posts
  • Location:India

Posted 03 September 2009 - 11:51 PM

Apologize for being impatient smile.gif frustrated with above probelm

and thank a lot Randy.. you are life saver.. I understood about the SSL issue and going to implement it right away, not sure about the second part about IP showing as external links.. and yes you are right, we have an ecommerce site and we have provided relative links /pagename.html rather than http://www.example.com/pagename.html because what happening where you are in secure mode (https) the absolute url with http conflicts regarding insecure data in that page and loosing trust due to that.. Please let me know if anything i can resolve regarding IP issue as external links myself.. or an explanation that i can forward to hosting support to fix

Thanks once again Randy ..

regards
Kashyap

#5 kashyap_rajput

kashyap_rajput

    HR 4

  • Active Members
  • PipPipPipPip
  • 127 posts
  • Location:India

Posted 10 September 2009 - 09:55 AM

anybody help me solve the IP issue (its my same domain ip) that shows me in web master tool as external link? how can i block google to stop crawling IP from crawling domain pages as i think this will again create problem of duplicate text

#6 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 10 September 2009 - 11:15 AM

I don't know that you're going to find any solution without having pretty deep access to Apache or whatever is running as the web server. Which you're not going to have unless you actually run the server yourself. Well, other than changing from referential to absolute links to stop the flow from one page to the next. That would stop it from happening over time.

You haven't said though if this is an actual problem or simply a Webmaster Tools problem. For instance, if you search for some specific text on one of your pages does it show both the domain name version and the IP number version? If the IP number version isn't in their index you don't really have a problem. It's just a webmaster tools thing.

Why don't you shoot a message off to the Webmaster Tools folks or ask your question in the WMT Group over at Google Groups. They should be able to fix this issue for you.

#7 kashyap_rajput

kashyap_rajput

    HR 4

  • Active Members
  • PipPipPipPip
  • 127 posts
  • Location:India

Posted 11 September 2009 - 01:31 AM

Hello

I think its just the google webmaster tool showing external links from our same IP, when i check in google as link:domain it doesnt show any IP also doesnt show any of the third party external links, Its good suggestion to contact group of webmaster tool so they might help in this.. Thanks a lot

regards
Kashyap




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

SPAM FREE FORUM!
 
If you are just registering to spam,
don't bother. You will be wasting your
time as your spam will never see the
light of day!