Are you a Google Analytics enthusiast?
Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE!

www.CustomReportSharing.com
From the folks who brought you High Rankings!
More SEO Content
International SEM | Social Media | Search Friendly Design | SEO | Paid Search / PPC | Seminars | Forum Threads | Q&A | Copywriting | Keyword Research | Web Analytics / Conversions | Blogging | Dynamic Sites | Linking | SEO Services | Site Architecture | Search Engine Spam | Wrap-ups | Business Issues | HRA Questions | Online Courses
Yahoo Sitematch And Non-yahoo Search Engines
Started by
AlDugan
, Feb 24 2005 04:52 PM
7 replies to this topic
#1
Posted 24 February 2005 - 04:52 PM
If I submit a feed using Yahoo Sitematch and using unique url's (url's that could not be found by another search engine spider normally because there would be no links to them anywhere on our site or anyone else's) would it ever be possible for another search engine to find those url's and index them?
For ex say my normal url is: mydomain.com?product=1234
So I submit this page to Yahoo Sitematch as: mydomain.com?product=1234&ref=sitematch
Could that 2nd url ever be found by another engine thus cause duplicate content issues? (Assuming our site never links to this page internally nor do any other sites)
For ex say my normal url is: mydomain.com?product=1234
So I submit this page to Yahoo Sitematch as: mydomain.com?product=1234&ref=sitematch
Could that 2nd url ever be found by another engine thus cause duplicate content issues? (Assuming our site never links to this page internally nor do any other sites)
#2
Posted 24 February 2005 - 05:10 PM
I'm not positive, but my guess would be yes. To be safe, I'd exclude them from other engines via robots.txt.
#3
Posted 24 February 2005 - 05:19 PM
Hmmm.. I was afraid of that. That would be a problem though because they are not actually different pages they just have different parameters. (Is it possible to use robots.txt to exclude a large list (about 1,000) of specific url's in this case?
#4
Posted 24 February 2005 - 06:55 PM
Again, I think so, but am not sure. You may want to pm Alan as he's the one I turn to with any robots.txt questions!
You should be able to list the exact URLs you've given to Yahoo and exclude them from the appropriate engines.
You should be able to list the exact URLs you've given to Yahoo and exclude them from the appropriate engines.
#5
Posted 25 February 2005 - 09:29 AM
Ok, Thank You Jill.
#6
Posted 25 February 2005 - 10:39 AM
Strange, I was PM'd! 
Three ways to avoid duplicate content...
You can handle this using robots.txt as follows:
1) Instead of using mydomain.com/?product=1234&ref=sitematch, use mydomain.com/?ref=sitematch&product=1234
2) Then exclude ?ref= using robots.txt:
The great thing about robots.txt is that it stops lots of pages being served to robots when those pages will never be indexed. However, it's not perfect as in theory people can still link to mydomain.com/?product=1234&ref=sitematch
Alternatively, if the content is delivered dynamically, put a meta robots noindex tag in the header if the URL contains a ref query parameter:
This way, the content may be requested lots of times by robots, but it will never be indexed. Use the first two solutions together for maximum benefit.
Alternatively, make mydomain.com/?product=1234&ref=sitematch a 301 redirect to mydomain.com/?product=1234 (after having set a cookie or whatever to store the referrer). This is probably the best approach but may require the most work, too.
Three ways to avoid duplicate content...
You can handle this using robots.txt as follows:
1) Instead of using mydomain.com/?product=1234&ref=sitematch, use mydomain.com/?ref=sitematch&product=1234
2) Then exclude ?ref= using robots.txt:
CODE
User-agent: *
Disallow: /?ref=
Disallow: /?ref=
The great thing about robots.txt is that it stops lots of pages being served to robots when those pages will never be indexed. However, it's not perfect as in theory people can still link to mydomain.com/?product=1234&ref=sitematch
Alternatively, if the content is delivered dynamically, put a meta robots noindex tag in the header if the URL contains a ref query parameter:
CODE
<head>
...
<META NAME="ROBOTS" CONTENT="NOINDEX">
...
</head>
...
<META NAME="ROBOTS" CONTENT="NOINDEX">
...
</head>
This way, the content may be requested lots of times by robots, but it will never be indexed. Use the first two solutions together for maximum benefit.
Alternatively, make mydomain.com/?product=1234&ref=sitematch a 301 redirect to mydomain.com/?product=1234 (after having set a cookie or whatever to store the referrer). This is probably the best approach but may require the most work, too.
#7
Posted 25 February 2005 - 10:47 AM
Is this just because you want to see it as a landing page for marketing purposes?
I dont see how you would lose out even if it was classed as duplicate content? If it is the same page it is the same page.
Just curious
I dont see how you would lose out even if it was classed as duplicate content? If it is the same page it is the same page.
Just curious
#8
Posted 25 February 2005 - 04:51 PM
Thanks Alan.
Paz, even though they really are the same page. To a search engine they look like 2 different pages.
Paz, even though they really are the same page. To a search engine they look like 2 different pages.
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users








