Are you a Google Analytics enthusiast?
Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE!

www.CustomReportSharing.com
From the folks who brought you High Rankings!
More SEO Content
International SEM | Social Media | Search Friendly Design | SEO | Paid Search / PPC | Seminars | Forum Threads | Q&A | Copywriting | Keyword Research | Web Analytics / Conversions | Blogging | Dynamic Sites | Linking | SEO Services | Site Architecture | Search Engine Spam | Wrap-ups | Business Issues | HRA Questions | Online Courses
Dynamic Content - Using Wildcards In Robots.txt
Started by
worldofrugs
, Nov 04 2008 02:20 PM
7 replies to this topic
#1
Posted 04 November 2008 - 02:20 PM
I've been over many posts, but am unsure on how to write this down in my robots.txt file.
I have dynamic pages in ASP...
Example: products.asp
However, being afraid of duplicated content, I like to filter the strings...
To give you an idea:
I have the page products.asp in folder Sub
When sorted (via button) the url will look like: www.xxx.com\sub\products.asp?Order=Sorter%5FNumber&Dir=ASC
I could also do a search term like: Red
The url would then look like: www.xxx.com\sub\products.asp?s_keyword=Red
Adding a second keyword: www.xxx.com\sub\products.asp?s_keyword=Red&s_keyword1=Blue
Then add the sorting function: www.xxx.com\sub\products.asp?s_keyword=Red&s_keyword1=Blue&Order=Sorter%5FNumber&Dir=ASC
The products.asp page has several pages to it (with no additional search string and/or sorting). This to have only a specific amount of product on 1 page, instead of all products on that 1 page (takes to long to load!)
The url for the second and further pages will look like:
www.xxx.com\sub\products.asp?Page=2
The question:
How can I have products.asp be allowed to be indexed, WITH the following pages, but prevent the sorting, search, etc. to be indexed (as they would produce duplicate content), for Google, Yahoo & MSN?
I have dynamic pages in ASP...
Example: products.asp
However, being afraid of duplicated content, I like to filter the strings...
To give you an idea:
I have the page products.asp in folder Sub
When sorted (via button) the url will look like: www.xxx.com\sub\products.asp?Order=Sorter%5FNumber&Dir=ASC
I could also do a search term like: Red
The url would then look like: www.xxx.com\sub\products.asp?s_keyword=Red
Adding a second keyword: www.xxx.com\sub\products.asp?s_keyword=Red&s_keyword1=Blue
Then add the sorting function: www.xxx.com\sub\products.asp?s_keyword=Red&s_keyword1=Blue&Order=Sorter%5FNumber&Dir=ASC
The products.asp page has several pages to it (with no additional search string and/or sorting). This to have only a specific amount of product on 1 page, instead of all products on that 1 page (takes to long to load!)
The url for the second and further pages will look like:
www.xxx.com\sub\products.asp?Page=2
The question:
How can I have products.asp be allowed to be indexed, WITH the following pages, but prevent the sorting, search, etc. to be indexed (as they would produce duplicate content), for Google, Yahoo & MSN?
#2
Posted 04 November 2008 - 04:03 PM
Welcome worldofrugs ! 
So I have a question, but let's let that wait for a moment.
If I understand the situation correctly you want to exclude pages that use the /sub/products.asp page, but only if they have one of three variables in the url string. Those three variables being Order=, s_keyword= and s_keyword1=. Is this right?
And now the question, can the s_keyword1= show up in a url string without the s_keyword= variable also being in the picture? If the "1" version can only be there if the non-numbered version is there, you won't really need to trigger anything for that one. It would get triggered by an exclusion of s_keyword=
If the "1" version can show up without the non-numbered version you'd need to add a third disallow specifically for that variable.
Now, assuming I have all of the above straight and the s_keyword1= can't show up without s_keyword= being part of the url, a robots.txt entry like the following should do the trick.
With the above spiders would be allowed to get to the /sub/products.asp file, as long as neither the Order= or s_keyword= are in the url string. No matter where those fall in the query string. And it'll work for the big three engines (Google, Yahoo, MSN/Live) since they all support wildcards in robots.txt even though it's not part of the standard.
And just a point of clarity, you don't need a wildcard at the end of either since with robots.txt there is an implied wildcard at the end of each and every line if you don't use the $ character to turn it off.
So I have a question, but let's let that wait for a moment.
If I understand the situation correctly you want to exclude pages that use the /sub/products.asp page, but only if they have one of three variables in the url string. Those three variables being Order=, s_keyword= and s_keyword1=. Is this right?
And now the question, can the s_keyword1= show up in a url string without the s_keyword= variable also being in the picture? If the "1" version can only be there if the non-numbered version is there, you won't really need to trigger anything for that one. It would get triggered by an exclusion of s_keyword=
If the "1" version can show up without the non-numbered version you'd need to add a third disallow specifically for that variable.
Now, assuming I have all of the above straight and the s_keyword1= can't show up without s_keyword= being part of the url, a robots.txt entry like the following should do the trick.
CODE
User-Agent: *
Disallow: /sub/products.asp?*Order=
Disallow: /sub/products.asp?*s_keyword=
Disallow: /sub/products.asp?*Order=
Disallow: /sub/products.asp?*s_keyword=
With the above spiders would be allowed to get to the /sub/products.asp file, as long as neither the Order= or s_keyword= are in the url string. No matter where those fall in the query string. And it'll work for the big three engines (Google, Yahoo, MSN/Live) since they all support wildcards in robots.txt even though it's not part of the standard.
And just a point of clarity, you don't need a wildcard at the end of either since with robots.txt there is an implied wildcard at the end of each and every line if you don't use the $ character to turn it off.
#3
Posted 04 November 2008 - 04:19 PM
Thanks for your fast reply Andy
and thanks or the warm welcome!
s_keyword= and s_keyword1 can show up seperate, but I assume that's easy to with adding Disallow: /sub/products.asp?*s_keyword1= to your list correct?
As I have more sorters (like ?Order=Sorter%5FOrigin&Dir=ASC or ?Order=Sorter%5FOrigin&Dir=DEC, I assume these are blocked as well with your solution?
Also just to make sure (gosh I'm so paranoid!
)...
So only pages that are followed with the ?Page= will be indexed correct?
This string will ONLY happen if you were to goto products.asp and click on the following pages. No extra strings added
Thanks for the help so far Randy!
s_keyword= and s_keyword1 can show up seperate, but I assume that's easy to with adding Disallow: /sub/products.asp?*s_keyword1= to your list correct?
As I have more sorters (like ?Order=Sorter%5FOrigin&Dir=ASC or ?Order=Sorter%5FOrigin&Dir=DEC, I assume these are blocked as well with your solution?
Also just to make sure (gosh I'm so paranoid!
So only pages that are followed with the ?Page= will be indexed correct?
This string will ONLY happen if you were to goto products.asp and click on the following pages. No extra strings added
Thanks for the help so far Randy!
#4
Posted 04 November 2008 - 05:48 PM
QUOTE
s_keyword= and s_keyword1 can show up seperate, but I assume that's easy to with adding Disallow: /sub/products.asp?*s_keyword1= to your list correct?
1. Yup! That'll do the trick.
2. If you disallow the Order= variable part, any url that contains that particular variable and includes the /sub/products.asp page will be excluded. All of your examples would be excluded.
3. ?Page= will be wholly unaffected by the exclusion, as long as those urls don't contain one of the variables you want to block. Those will continue to be crawled and indexed.
#5
Posted 05 November 2008 - 09:54 AM
Thank you so much for the help and the great explanation Randy!
I owe you one
I owe you one
#6
Posted 05 November 2008 - 10:30 AM
One more question (if you don't mind)...
If I have more files, like products.asp and files.asp that use the same structure, can I use this and will both be still indexed as discussed with the products.asp ?
Disallow: /sub/*?*Order=
If I have more files, like products.asp and files.asp that use the same structure, can I use this and will both be still indexed as discussed with the products.asp ?
Disallow: /sub/*?*Order=
#7
Posted 05 November 2008 - 12:00 PM
Yup, that would work. As long as there are no other URLs in the/sub/ path that also utilize the Order= variable that you need to be indexed.
That's the issue with any type of wildcards. Make sure there are no exceptions to what is basically a regular expression ruleset that might cause issues.
That's the issue with any type of wildcards. Make sure there are no exceptions to what is basically a regular expression ruleset that might cause issues.
#8
Posted 05 November 2008 - 05:45 PM
Thanks Randy, you have been a great help!
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users







