Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!



Photo
- - - - -

Yahoo Showing Pages I Do Not Have


  • Please log in to reply
16 replies to this topic

#1 SERPico

SERPico

    HR 4

  • Active Members
  • PipPipPipPip
  • 249 posts

Posted 27 May 2007 - 07:16 AM

I had a thread started in another forum section based on reviewing my stats and noticed this regarding indexing of the site i am working on,

I just discovered a lot of pages indexed using Yahoo and the site:www.mydomain.com command that look the same and i did not create, the site is only up for a week and it doesn't consist of more then 1000 pages in contrast Yahoo is showing being indexed. What i even find more disturbing is that the pages indexed are all variations of the index.php home page?

The site isn't about Canon or A300?
And also not about Swap + Meet
And also not about Eurail + Com

Why are these type of pages indexed?
And it's all appended after the index.php part.

My home page is ending with index.php but why is all that other stuff turning up?

The site is like brandnew and in early development stages.


These are all pages indexed like this:


www. mydomain.com/index.php?q=Canon+A300
ww. mydomain.com/index.php?q=Corgi+Cars
www. mydomain.com/index.php?q=Air+Travel

#2 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 27 May 2007 - 09:02 AM

A couple of quick thoughts.

Have you run Xenu Link Slueth against your site to make sure nobody has hacked into it somehow and added links to these non-existent pages?

Other than this pretty remote possibility it sounds to me like Y! has somehow gotten ahold of either some old data --if the domain was owned by someone else before-- or they've possibly found an unprotected web stats page on your site showing some referrer spam. Your stats are password protected, right?

As a general rule the spiders are not going to crawl any page, let alone rank it, unless they've found at least one link to that exact url. So something out there somewhere is leading them to believe those query strings are valid.

You didn't say, does your index.php page ever need to use any variables? If not, you could set up a simple php function at the top of your index.php page to look for query strings and perform a 301 back to just the domain name if one exists. That might help Y! get things straight.

#3 SERPico

SERPico

    HR 4

  • Active Members
  • PipPipPipPip
  • 249 posts

Posted 27 May 2007 - 09:56 AM

QUOTE(Randy @ May 27 2007, 04:02 PM) View Post
A couple of quick thoughts.

Have you run Xenu Link Slueth against your site to make sure nobody has hacked into it somehow and added links to these non-existent pages?


Hi Randy,

I appreciate your thoughts smile.gif

Downloaded and utilized the suggested software,

Result:

List of valid URLs you can submit to a search engine, e.g. Google Sitemaps :

The displayed valid URLs are all known by me and created by me, no unfamiliar links present.
So it's safe to say that there aren't any links added to the page without me knowing.

QUOTE
Other than this pretty remote possibility it sounds to me like Y! has somehow gotten ahold of either some old data --if the domain was owned by someone else before-- or they've possibly found an unprotected web stats page on your site showing some referrer spam. Your stats are password protected, right?


I think they are?
I'm not sure how to verify this though, i haven't seen any folders when i use FTP that mentions "stats" or something like that?

Edit: I found the stats folders and they return 404 not found messages.
When i enter a folder that is used to deposit other pages that i use there is a display of links so it appears to me that the stats folder is protected right?




I have cPanel
Also, i don't think it's due to the domain being owned b someone else, it's a domain from the nineties and i see the exact same type of URL strings in my stats control panel.

So it's definitely something that is causing it to happen apart from from outside factors.

Also on a side note:
Is it normal to have files in the public_html and at the same time having those same files/pages in the www ??


QUOTE
As a general rule the spiders are not going to crawl any page, let alone rank it, unless they've found at least one link to that exact url. So something out there somewhere is leading them to believe those query strings are valid.

Could it be a .htaccess issue?
Mod rewrite?

I have used a .htaccess file i had found on the SEOBook blog a while back to cover the non-www to www issue.

QUOTE
RewriteEngine On

RewriteCond %{HTTP_HOST} ^mydomain.com [NC]
RewriteRule ^(.*)$ http://www.mydomain.com/$1 [L,R=301]


Containing above lines, are these alright?

QUOTE
You didn't say, does your index.php page ever need to use any variables? If not, you could set up a simple php function at the top of your index.php page to look for query strings and perform a 301 back to just the domain name if one exists. That might help Y! get things straight.


The page holds a .php select menu so that is the basically why it's in .php
I don't think there would be any variables used then? I think?

Apologies for asking daft questions but I'm not that technical with these type of things smile.gif

What kind of .php code you suggested should i put in place to perform the suggested course of action?

Also, if the stats folders are protected what could the cause be that causes this issue?

Thanks!

Edited by SERPico, 27 May 2007 - 10:14 AM.


#4 Jill

Jill

    Recovering SEO

  • Admin
  • 33,244 posts

Posted 27 May 2007 - 10:13 AM

Have you recently changed IP addresses? It's possible that another site on your server is sharing your IP and one or another of the domains aren't configured correctly at the server.

#5 SERPico

SERPico

    HR 4

  • Active Members
  • PipPipPipPip
  • 249 posts

Posted 27 May 2007 - 10:21 AM

Well what did happen is that the initial set up was as such that the domain being used was used as a add on domain from the main domain in the Bluehost.com account.
This has recently changed as i thought it would be better to have the domain that is being used as the main domain.

So these are the changes that where made recently.
I don't think they have changed the IP addresses, just the change of assigning the domain being used as the main domain and not a sub domain anymore.

Nice to see you Jill smile.gif

#6 SERPico

SERPico

    HR 4

  • Active Members
  • PipPipPipPip
  • 249 posts

Posted 27 May 2007 - 12:39 PM

I remembered i had copied this error line from the error log but a strange thing is that it's not their anymore?

But anyhow, this is the line:
QUOTE
[Sun May 27 05:03:13 2007] [error] [client 207.46.98.116] mod_rewrite: maximum number of internal redirects reached. Assuming configuration error. Use 'RewriteOptions MaxRedirects' to increase the limit if neccessary.


I'm not familiar with the meaning of this though...

Secondly the code in the .htacess file i used is different then the one Randy had posted:

QUOTE(Posted by Randy)
Options +FollowSymLinks
CODE
RewriteEngine on
RewriteCond %{HTTP_HOST} ^mydomain\.com [NC]
RewriteRule ^(.*)$  http://www.mydomain.com/$1  [L,R=301]


As you can see there is an extra line present:
CODE
Options +FollowSymLinks


which isn't present in the code i had used in the former .htaccess file:

CODE
RewriteEngine On

RewriteCond %{HTTP_HOST} ^mydomain.com [NC]
RewriteRule ^(.*)$  http://www.mydomain.com/$1  [L,R=301]


There is also an blank line present as well which is also different from Randy's code.

As i mentioned before i am not that technical but i suspect that this is the underlying cause for all these duplicate home page results being indexed by Yahoo?

As described in my first post.

The page is not hacked.
Stat files are protected

What do you guys think?
And how long would it take to have this error reflecting corrected in Yahoo's index?

Thanks!

#7 SERPico

SERPico

    HR 4

  • Active Members
  • PipPipPipPip
  • 249 posts

Posted 27 May 2007 - 05:15 PM

I'm wondering how it is possible an URL like:

www. mydomain.com/index.php?q=sporting+goods

can actually resolve to the orginal home page search.gif
That shouldn't be happening.
Thanks.

#8 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 27 May 2007 - 07:59 PM

Sure it should be happening.

Query strings that aren't present are simply ignored. Just because they're in the URL string doesn't mean they're actually called in the page.

re: Option +FollowSymlinks

It depends upon the base configuration of Apache. If the ability to follow Symbollic Links is already enabled in the httpd.conf file you don't need it. If it's not, you do. But if you don't need it because it's already enabled in httpd.conf it does no harm to declare it in your .htaccess too. Which is why I have always recommended including it.

re: The stats possibility.

QUOTE
I think they are?
I'm not sure how to verify this though, i haven't seen any folders when i use FTP that mentions "stats" or something like that?


You likely won't see a folder. Most servers use a bit of redirection to deliver stats to a location at which they don't actually exist.

When you enter the URL to view your stats into your browser are you asked for a username and password? If you are they're protected as long as there's no other path people can use to get to the stats without having to enter a username and password. If you don't have to enter a user/pass your stats are unprotected, and susceptible to referrer spamming.

You may want to use the information in our [url=http://www.highrankings.com/forum/index.php?showtopic=19320]Search Engine Contact List[/url] thread to drop the folks at Y! a note about the issue. At least it will give them a heads up about non-existent pages that need to be removed from their SERPs, but they may also be able to tell you how they're coming up with those if it's something on your site.

Edited by Randy, 28 May 2007 - 05:54 AM.
to fix quote


#9 SERPico

SERPico

    HR 4

  • Active Members
  • PipPipPipPip
  • 249 posts

Posted 28 May 2007 - 05:12 AM

QUOTE(Randy @ May 28 2007, 02:59 AM) View Post
Sure it should be happening.

Query strings that aren't present are simply ignored. Just because they're in the URL string doesn't mean they're actually called in the page.


Hi Randy smile.gif

If they are being ignored then i understand why it still resolves, not the technical aspect of it but if the additional information in the URL string is ignored that's good enough for me wink1.gif

QUOTE
re: Option +FollowSymlinks

It depends upon the base configuration of Apache. If the ability to follow Symbollic Links is already enabled in the httpd.conf file you don't need it. If it's not, you do. But if you don't need it because it's already enabled in httpd.conf it does no harm to declare it in your .htaccess too. Which is why I have always recommended including it.

Thanks, how could i verify by the way if it's enabled in httpd.conf ? I can't seem to find this file.

QUOTE
You likely won't see a folder. Most servers use a bit of redirection to deliver stats to a location at which they don't actually exist.


I'm working with cPanel and in the /tmp folder the stats folders are also there, i see a awstats folder , webalizer and a few others.
But when i enter the path in the URL i receive a 404 not found page or leaving out the www from the URL a server not found message delivered by firefox.


QUOTE
When you enter the URL to view your stats into your browser are you asked for a username and password? If you are they're protected as long as there's no other path people can use to get to the stats without having to enter a username and password. If you don't have to enter a user/pass your stats are unprotected, and susceptible to referrer spamming.

For reviewing my stats i need to login with cPanel and click the web/ftp stats link, so that should be okay i guess? There is no URL to access the stats, you get access to your stats by logging into cPanel only.

QUOTE
You may want to use the information in our <a href="http://www.highranki...howtopic=19320" target="_blank">[url=http://www.highrankings.com/forum/index.php?showtopic=19320]Search Engine Contact List[/url]</a> thread to drop the folks at Y! a note about the issue. At least it will give them a heads up about non-existent pages that need to be removed from their SERPs, but they may also be able to tell you how they're coming up with those if it's something on your site.


Thanks for that reference list smile.gif Very useful.

I did some checking and with the help of a friend i got this helpdesk.bluehost.com/kb/index.php?x=&mod_id=2&id=410

provided by Bluehost and under part 2 you have this .htaccess code which supposed to direct incoming requests for the homepage to:

www.mydomain.com/index.php

Uploading this code provided by bluehost:

CODE
RewriteEngine On
         RewriteCond %{HTTP_HOST} ^domain.com$
         RewriteRule ^$ http://domain.com/index.php [L,R=301]


does not make the entered URL change to index.php
It still looks the same and not only that it also doesn't change to the www version.

Adding the www in the last line does fix this issue though.

Question is...firstly why does it not jump to www.domain.com/index.php

And secondly adding the www to this line would this cause any disruption for the code to function properly?

I think having a .htaccess code like you provided Randy combined with changing domain.com to domain.com/index.php would be the fix for getting rid of this strange indexing problem?

Knowing you have a good understanding about .htaccess what do you suggest what code could be used to combine these two together?

1: non-www to --> www
2: domain.com to --> domain.com/index.php

Everyone else who could make a suggestion is very welcome to do so of course smile.gif

Thanks!

#10 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 28 May 2007 - 06:05 AM

You'll need to ask your host why the code they provide does or doesn't work. I'd hate to guess since their server setup is probably a little bit different than mine.

FWIW, I wouldn't do the /index.php bit in there at all. Just point it to the domain name without the file name. Also realize that the provided code is only going to help with domain level queries. It's not going to correct every page.

Their Example #1 is more along the lines of what you want.

CODE
RewriteEngine On
RewriteCond %{HTTP_HOST} !^www.domain.com$ [NC]
RewriteRule ^(.*)$ http://www.domain.com/$1 [L,R=301]


That says if the request is not for the www version of the domain name, redirect to the www version and append any file information from the original request.

QUOTE
For reviewing my stats i need to login with cPanel and click the web/ftp stats link, so that should be okay i guess? There is no URL to access the stats, you get access to your stats by logging into cPanel only.


There's a URL to view the stats. There has to be. The only question is if it's set up to be password protected or not. You should be able to get the actual URL address by looking at the Properties in your browser while viewing your stats. In IE that would be File > Properties if you have the menu bar visible. If not try right clicking to see if it brings up a Properties option.

re: httpd.conf That's probably not something you're going to be able to see or configure. On a shared server every site shares the apache configuration. So it's going to be several layers deeper than what any domain on the server can reach.

#11 SERPico

SERPico

    HR 4

  • Active Members
  • PipPipPipPip
  • 249 posts

Posted 28 May 2007 - 06:50 AM

QUOTE
There's a URL to view the stats. There has to be. The only question is if it's set up to be password protected or not. You should be able to get the actual URL address by looking at the Properties in your browser while viewing your stats. In IE that would be File > Properties if you have the menu bar visible. If not try right clicking to see if it brings up a Properties option.

You're right, when i use the links i get a username and password request window popping up.
So that's one possible cause scratched of the list that causes these duplicate URLs to be created.

QUOTE
re: httpd.conf That's probably not something you're going to be able to see or configure. On a shared server every site shares the apache configuration. So it's going to be several layers deeper than what any domain on the server can reach.


I see, thanks for the explanation.

QUOTE
You'll need to ask your host why the code they provide does or doesn't work. I'd hate to guess since their server setup is probably a little bit different than mine.

FWIW, I wouldn't do the /index.php bit in there at all. Just point it to the domain name without the file name. Also realize that the provided code is only going to help with domain level queries. It's not going to correct every page.

Hmm, i really thought that would be the solution for these duplicate indexed pages, if that's not the fix i am looking for then inquiring about the code not resulting to jumping to index.php wouldn't be necessary smile.gif
Also i didn't realize it was only working for the domain itself, what kind of code do you need to have a url like http:// mydomain.com/page1.php direct to http://www. mydomain.com/page1.php ??
Not only for one specific URL but for all incoming URL requests without the www attached?

I mean that is what the whole idea is of redirecting non-www to www right?
To prevent two different pages being indexed and not only the main home page?

That was my perception though smile.gif

QUOTE
Their Example #1 is more along the lines of what you want.

CODE
RewriteEngine On
         RewriteCond %{HTTP_HOST} !^www.domain.com$ [NC]
         RewriteRule ^(.*)$ http://www.domain.com/$1 [L,R=301]


That says if the request is not for the www version of the domain name, redirect to the www version and append any file information from the original request.


Thanks, i will use this then.

It seems this is quite a complicated problem that has been presented and seeing you don't have a solution either or know what the cause can be for these duplicate pages this means it's indeed a very complicated issue and something only Yahoo themself can probably answer.

No wanting to be at the mercy of Yahoo it's probably best to switch to another hosting provider as this is something on their end probably.

What do you think i should do Randy?
I'm totally lost here, if someone like you with such an extensive knowledge on SEO and .htaccess don't see a solution then it's probably a real mind bender.

I really appreciate your input on this Randy.
Thanks for offering your assistance!

#12 SERPico

SERPico

    HR 4

  • Active Members
  • PipPipPipPip
  • 249 posts

Posted 28 May 2007 - 01:04 PM

I found the cullprit!
It appears since the domain was parked for a while the parking provider did something to create all these URLs being stored in the indexes.

I was using Gigablast.com and clicked the cache link which instead of Y! cache did not show the actual home page the domain holds now currently.

The cache showed a parking page layout from a former parking solution provider.

I already contacted Yahoo about this so I'm awaiting their response to see what they will do about it and will contact Google about it as well.

Secondly, about this suggestion you mentioned Randy?

QUOTE
You didn't say, does your index.php page ever need to use any variables? If not, you could set up a simple php function at the top of your index.php page to look for query strings and perform a 301 back to just the domain name if one exists. That might help Y! get things straight.


Not only Yahoo but Google as well as G will also encounter all these dup index.php pages which is why probably the site hasn't been indexed yet.

What kind of code should i place on the original index.php page?
Do you have a code in mind?

Or if that is not an option could you please be more specific so i can give out specifications to a coder, if this requires more then a quick suggestion?

Thanks!

#13 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 28 May 2007 - 03:58 PM

QUOTE
What kind of code should i place on the original index.php page?
Do you have a code in mind?


Important Note: Only use this if other legitimate pages aren't available via your index.php page with a query string. Also, make sure you place it at the very beginning of the page code. If you use includes for your header you'll need to do some additional detection to make sure it only activates when someone hits your index.php page. It has to be the first thing parsed, before any other headers are sent.

CODE
<?php
if($_SERVER['QUERY_STRING']) {
    header("HTTP/1.1 301 Moved Permanently");
    header("Location: http://www.yourodomain.com/");
    exit();
}
?>


If you want to play with a test version of it, I stuck one up in a subdirectory on my personal site here. Go ahead, try to make it deliver the index page there with the query in the URL. I dare ya. wink1.gif

#14 SERPico

SERPico

    HR 4

  • Active Members
  • PipPipPipPip
  • 249 posts

Posted 29 May 2007 - 02:37 AM

QUOTE(Randy @ May 28 2007, 10:58 PM) View Post
Important Note: Only use this if other legitimate pages aren't available via your index.php page with a query string. Also, make sure you place it at the very beginning of the page code. If you use includes for your header you'll need to do some additional detection to make sure it only activates when someone hits your index.php page. It has to be the first thing parsed, before any other headers are sent.

CODE
<?php
   if($_SERVER['QUERY_STRING']) {
       header("HTTP/1.1 301 Moved Permanently");
       header("Location: http://www.yourodomain.com/");
       exit();
   }
   ?>


If you want to play with a test version of it, I stuck one up in a subdirectory on my personal site here. Go ahead, try to make it deliver the index page there with the query in the URL. I dare ya. wink1.gif


Morning Randy,

I pasted a similar query string i see on my domain that is indexed like that index.php?q=sporting+goods and it jumps right back to the orginal URL smile.gif

I will test it out to see if it doesn't interfere with the php select menu on the index.php page first.
Hopefully this causes all these bogus indexed pages to dissapear/drop from the index?

What i am wondering though is since this is a 301 and it redirects the other versions of the original index.php to the original index.php if these bogus pages will still stay in place in the index and still pass any negative influences over to the original index.php ??

The SEs will start to recognize the original index.php as the original but will it keep getting this negative influence from these bogus pages or will these bogus pages just die off?

Thanks!

Edit: By the way how did you integrate the code?
I don't see it in the source code?


Edited by SERPico, 29 May 2007 - 03:55 AM.


#15 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 29 May 2007 - 10:55 AM

You shouldn't see it in the source code. It all happens before the <html> tag is even written, so the redirect triggers at the server level before anything else is sent. wink1.gif

As far as the rest, the other bogus pages won't get dropped immediately, but they will eventually. The 301 is effectively telling any spider that hits them that they no longer exist and have been replaced by your real index page.

Forget about even checking it for 4-6 months, then check back. By then the engines should have had an opportunity to catch up and correct things on their end.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

We are now a read-only forum.
 
No new posts or registrations allowed.