Hello
We are having an ecommerce site that has products of over 5k+, we have google sitemap and regularly use google webmaster tool to check if there are any issues, two points that i have found that needs explanation and how to get rid of them
1. When i check my site for cached pages in google, lots of pages with https (secure mode) cached, how to avoid them, is there anything i can do with htaccess or robots.txt.. generally we enable https mode after user puts items to cart page, not sure how google cached https pages.
2. when i check Links to your site from google webmaster tools, lets say check one page that has backlinks of 100+ clicking on that showing IP addresse of our own domain that shows backlinks for that page.. not sure how IP could got in external backlinks, except IP no other pages showing backlinking us however there are external sources that provides us backlinks, We have checked entire site to make sure we have not used IP anywhere in links to get it cached, Please explain me how can we get rid of it
That's all, Thank you for reading..
regards
Kashyap
Are you a Google Analytics enthusiast?
Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE!

www.CustomReportSharing.com
From the folks who brought you High Rankings!
More SEO Content
International SEM | Social Media | Search Friendly Design | SEO | Paid Search / PPC | Seminars | Forum Threads | Q&A | Copywriting | Keyword Research | Web Analytics / Conversions | Blogging | Dynamic Sites | Linking | SEO Services | Site Architecture | Search Engine Spam | Wrap-ups | Business Issues | HRA Questions | Online Courses
Cache Of Https And Ip Of The Same Domain
Started by
kashyap_rajput
, Sep 03 2009 05:33 AM
6 replies to this topic
#1
Posted 03 September 2009 - 05:33 AM
#2
Posted 03 September 2009 - 10:03 AM
nobody ??
#3
Posted 03 September 2009 - 11:03 AM
A bit impatient are we? 
For removing https secure pages from the index, here's what I do. It's a two step process and this is for Unix/Linux servers with .htaccess and mod_rewrite available.
First create an alternative robots.txt file for ssl pages. I call mine simply robots_ssl.txt to keep things logical. This file contains only the following:
Upload that to the root level of your site, in the same place you probably already have your normal robots.txt file.
Then in .htaccess you'd put the following:
What this does is detect requests for the robots.txt file made via the server's normal secure port of number 443. When it detects one of these requests, and they should only be coming from spiders, it'll automatically tell the spider to read the robots_ssl.txt file that tells them to leave all of the secure pages alone.
For the second question, it sounds like your server is configured to allow surfing via the IP number. I'll have to think on that one for a bit to see if there's a way to overcome it via .htaccess or other apache configuration. In essence the problem gets exacerbated if you're using referential links. If you were using links that had the fully qualified domain name (eg pointing to www.somesite.com/somefile.html rather than simply /somefile.html) the problem would not exist. Even if the spider landed on an IP number page they'd get correct links to follow.
Like I said, I'll need to think on that one for a bit. From the .htaccess perspective you'd need something like {THE_REQUEST}, but THE_REQUEST doesn't include hostname info, just the file path. And you wouldn't want to use the ip number in a {HTTP_HOST} because it would attempt to redirect all pages since that doesn't look at the actual request. My guess before thinking about it is you'll probably end up with a deeper level fix in the apache configuration for the site if you want/need to do a redirect. So it might be simpler to change the way you're linking to your own internal pages.
Moving this question to the robots.txt section of the forum since that's where part of the question belongs.
For removing https secure pages from the index, here's what I do. It's a two step process and this is for Unix/Linux servers with .htaccess and mod_rewrite available.
First create an alternative robots.txt file for ssl pages. I call mine simply robots_ssl.txt to keep things logical. This file contains only the following:
CODE
User-agent: *
Disallow: /
Disallow: /
Upload that to the root level of your site, in the same place you probably already have your normal robots.txt file.
Then in .htaccess you'd put the following:
CODE
RewriteCond %{SERVER_PORT} ^443$
RewriteRule ^robots.txt$ robots_ssl.txt
RewriteRule ^robots.txt$ robots_ssl.txt
What this does is detect requests for the robots.txt file made via the server's normal secure port of number 443. When it detects one of these requests, and they should only be coming from spiders, it'll automatically tell the spider to read the robots_ssl.txt file that tells them to leave all of the secure pages alone.
For the second question, it sounds like your server is configured to allow surfing via the IP number. I'll have to think on that one for a bit to see if there's a way to overcome it via .htaccess or other apache configuration. In essence the problem gets exacerbated if you're using referential links. If you were using links that had the fully qualified domain name (eg pointing to www.somesite.com/somefile.html rather than simply /somefile.html) the problem would not exist. Even if the spider landed on an IP number page they'd get correct links to follow.
Like I said, I'll need to think on that one for a bit. From the .htaccess perspective you'd need something like {THE_REQUEST}, but THE_REQUEST doesn't include hostname info, just the file path. And you wouldn't want to use the ip number in a {HTTP_HOST} because it would attempt to redirect all pages since that doesn't look at the actual request. My guess before thinking about it is you'll probably end up with a deeper level fix in the apache configuration for the site if you want/need to do a redirect. So it might be simpler to change the way you're linking to your own internal pages.
Moving this question to the robots.txt section of the forum since that's where part of the question belongs.
#4
Posted 03 September 2009 - 11:51 PM
Apologize for being impatient
frustrated with above probelm
and thank a lot Randy.. you are life saver.. I understood about the SSL issue and going to implement it right away, not sure about the second part about IP showing as external links.. and yes you are right, we have an ecommerce site and we have provided relative links /pagename.html rather than http://www.example.com/pagename.html because what happening where you are in secure mode (https) the absolute url with http conflicts regarding insecure data in that page and loosing trust due to that.. Please let me know if anything i can resolve regarding IP issue as external links myself.. or an explanation that i can forward to hosting support to fix
Thanks once again Randy ..
regards
Kashyap
and thank a lot Randy.. you are life saver.. I understood about the SSL issue and going to implement it right away, not sure about the second part about IP showing as external links.. and yes you are right, we have an ecommerce site and we have provided relative links /pagename.html rather than http://www.example.com/pagename.html because what happening where you are in secure mode (https) the absolute url with http conflicts regarding insecure data in that page and loosing trust due to that.. Please let me know if anything i can resolve regarding IP issue as external links myself.. or an explanation that i can forward to hosting support to fix
Thanks once again Randy ..
regards
Kashyap
#5
Posted 10 September 2009 - 09:55 AM
anybody help me solve the IP issue (its my same domain ip) that shows me in web master tool as external link? how can i block google to stop crawling IP from crawling domain pages as i think this will again create problem of duplicate text
#6
Posted 10 September 2009 - 11:15 AM
I don't know that you're going to find any solution without having pretty deep access to Apache or whatever is running as the web server. Which you're not going to have unless you actually run the server yourself. Well, other than changing from referential to absolute links to stop the flow from one page to the next. That would stop it from happening over time.
You haven't said though if this is an actual problem or simply a Webmaster Tools problem. For instance, if you search for some specific text on one of your pages does it show both the domain name version and the IP number version? If the IP number version isn't in their index you don't really have a problem. It's just a webmaster tools thing.
Why don't you shoot a message off to the Webmaster Tools folks or ask your question in the WMT Group over at Google Groups. They should be able to fix this issue for you.
You haven't said though if this is an actual problem or simply a Webmaster Tools problem. For instance, if you search for some specific text on one of your pages does it show both the domain name version and the IP number version? If the IP number version isn't in their index you don't really have a problem. It's just a webmaster tools thing.
Why don't you shoot a message off to the Webmaster Tools folks or ask your question in the WMT Group over at Google Groups. They should be able to fix this issue for you.
#7
Posted 11 September 2009 - 01:31 AM
Hello
I think its just the google webmaster tool showing external links from our same IP, when i check in google as link:domain it doesnt show any IP also doesnt show any of the third party external links, Its good suggestion to contact group of webmaster tool so they might help in this.. Thanks a lot
regards
Kashyap
I think its just the google webmaster tool showing external links from our same IP, when i check in google as link:domain it doesnt show any IP also doesnt show any of the third party external links, Its good suggestion to contact group of webmaster tool so they might help in this.. Thanks a lot
regards
Kashyap
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users








