Are you a Google Analytics enthusiast?
More SEO Content
Dynamic Content - Using Wildcards In Robots.txt
Posted 04 November 2008 - 02:20 PM
I have dynamic pages in ASP...
However, being afraid of duplicated content, I like to filter the strings...
To give you an idea:
I have the page products.asp in folder Sub
When sorted (via button) the url will look like: www.xxx.com\sub\products.asp?Order=Sorter%5FNumber&Dir=ASC
I could also do a search term like: Red
The url would then look like: www.xxx.com\sub\products.asp?s_keyword=Red
Adding a second keyword: www.xxx.com\sub\products.asp?s_keyword=Red&s_keyword1=Blue
Then add the sorting function: www.xxx.com\sub\products.asp?s_keyword=Red&s_keyword1=Blue&Order=Sorter%5FNumber&Dir=ASC
The products.asp page has several pages to it (with no additional search string and/or sorting). This to have only a specific amount of product on 1 page, instead of all products on that 1 page (takes to long to load!)
The url for the second and further pages will look like:
How can I have products.asp be allowed to be indexed, WITH the following pages, but prevent the sorting, search, etc. to be indexed (as they would produce duplicate content), for Google, Yahoo & MSN?
Posted 04 November 2008 - 04:03 PM
So I have a question, but let's let that wait for a moment.
If I understand the situation correctly you want to exclude pages that use the /sub/products.asp page, but only if they have one of three variables in the url string. Those three variables being Order=, s_keyword= and s_keyword1=. Is this right?
And now the question, can the s_keyword1= show up in a url string without the s_keyword= variable also being in the picture? If the "1" version can only be there if the non-numbered version is there, you won't really need to trigger anything for that one. It would get triggered by an exclusion of s_keyword=
If the "1" version can show up without the non-numbered version you'd need to add a third disallow specifically for that variable.
Now, assuming I have all of the above straight and the s_keyword1= can't show up without s_keyword= being part of the url, a robots.txt entry like the following should do the trick.
With the above spiders would be allowed to get to the /sub/products.asp file, as long as neither the Order= or s_keyword= are in the url string. No matter where those fall in the query string. And it'll work for the big three engines (Google, Yahoo, MSN/Live) since they all support wildcards in robots.txt even though it's not part of the standard.
And just a point of clarity, you don't need a wildcard at the end of either since with robots.txt there is an implied wildcard at the end of each and every line if you don't use the $ character to turn it off.
Posted 04 November 2008 - 04:19 PM
s_keyword= and s_keyword1 can show up seperate, but I assume that's easy to with adding Disallow: /sub/products.asp?*s_keyword1= to your list correct?
As I have more sorters (like ?Order=Sorter%5FOrigin&Dir=ASC or ?Order=Sorter%5FOrigin&Dir=DEC, I assume these are blocked as well with your solution?
Also just to make sure (gosh I'm so paranoid! )...
So only pages that are followed with the ?Page= will be indexed correct?
This string will ONLY happen if you were to goto products.asp and click on the following pages. No extra strings added
Thanks for the help so far Randy!
Posted 04 November 2008 - 05:48 PM
1. Yup! That'll do the trick.
2. If you disallow the Order= variable part, any url that contains that particular variable and includes the /sub/products.asp page will be excluded. All of your examples would be excluded.
3. ?Page= will be wholly unaffected by the exclusion, as long as those urls don't contain one of the variables you want to block. Those will continue to be crawled and indexed.
Posted 05 November 2008 - 09:54 AM
I owe you one
Posted 05 November 2008 - 10:30 AM
If I have more files, like products.asp and files.asp that use the same structure, can I use this and will both be still indexed as discussed with the products.asp ?
Posted 05 November 2008 - 12:00 PM
That's the issue with any type of wildcards. Make sure there are no exceptions to what is basically a regular expression ruleset that might cause issues.
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users