Are you a Google Analytics enthusiast?
More SEO Content
Posted 04 August 2003 - 12:51 AM
Will try to make this brief and to the point-
Recently moved/re-designed my website to a dynamic type of web design/storefront solution. Since my site has been moved, I've started to notice within Google's SE results two different url addresses going to the same page for same the same keyword phrase. I spoke with my designer about and my site has a product list page with one url address and then each product page has a url address. She then explained to me that the technique being used (I guess it's a technique...please forgive my lack of proper wording) is "url manipulation" and referred me to this link- http://www.xde.net/x...eb/qx/index.htm. The product list page has the more friendly SE url addresses.
Could this be a problem? Any other feedback is also appreciated.
Thanks in advance!
Posted 04 August 2003 - 03:15 AM
If I am correct you have dynamic site with a url like:
and you have used QXASP to rewrite that so that the url can be written as
sitename.com/product/2/ or similar
So as far as the serach engine thinsg there are two distint page BUT with exactly the same content on both. That is the problem. You need to get all references of the old ones out of the index, so you must not haev any links to them withing your site. Any that come in from the outside you will have to try an get changed. If you cant do that you will need to get a what is termed a 301 permanent direct to let the engines know that the url has changed for good.
It is possible on an apache server to do a blockwise 301 using a regular expressions to catch all variations of the urls.
Posted 04 August 2003 - 07:00 AM
You should definitely do whatever you can to minimize the chance of the engines picking up the exact same content using two URLs.
Oh, and welcome, Mati!
Posted 04 August 2003 - 10:14 AM
Very basically, what it does is put a program or script at the 404 page. Then when the "SEO friendly" URL is requested from the server, the 404 script/program processes the non-existent URL and displays the correct page while still showing the non-existent url in the browser. Because of the way this works, you have to keep the current page as is. If you tried to do a 301 or something this would also be applied to the "new" URL.
I have not found a way to remove the duplicate URLs, I guess you could restructure the site and use a robot.txt file to keep the bots off of the real pages. Too much of a pain for me, so I am still looking.
Posted 04 August 2003 - 12:18 PM
My designer decided to use this techinque (url manipulation- http://www.xde.net/x...eb/qx/index.htm ) after she read the book Search Engine Positioning,by Fredrick W. Marckini . I havn't read this book myself. Has anyone else and if so, wouldn't the issue of two url addresses getting picked up by the SE's (especially Google) be addressed in this book?
Posted 13 August 2003 - 08:04 AM
The first, and in my opinion the best, is to have the script actually generating the dynamic content both parse and create the SE friendly URLs. There are a few ways to accomplish this, the most common being to use the PATH_INFO environment variable instead of referencing the query string. If an application is designed this way, it's easy. Converting an existing script, though, can be a little more difficult.
The second way, mentioned by Andy, is to let the server rewrite the URLs on the fly. Again, there are a few ways to do this, with mod_rewrite being the most common. The danger here is the additional load put on the server for every page requested. For sites with little traffic that can be ignored, for busier sites it's still rarely an issue, but for large sites with lots of pages it can potentially bring a server to its knees if you get hit by an aggressive spider. The spider thinks its requesting static pages and expects them to be returned with the usual speed of a static page. This is a problem with ANY of the methods for SE friendly URLs, but can be exacerbated by throwing mod_rewrite into the mix. It's not a deal-buster, but should be a concern. Make sure your hardware is up to the task.
The third way, also mentioned above, is to have an intervening script create the page. The usual choice is to install a custom 404 page, which is almost always even more resource-hungry than mod_rewrite. Additionally, the script has to be very carefully written to insure the correct status code is returned to the requesting agent. Send back anything except a 200 status and you've probably defeated the whole scheme.
Properly implemented, NONE of these three techniques should result in duplicate pages. The whole idea of these techniques is to both display and process URLs that have no ? query string. If you have two identical pages in Google, one SE friendly and one with a question mark, something is being done wrong.
Fortunately, there's still a very simple answer, at least for Google.
Google recognizes certain extensions to the robots.txt file that can help eliminate the problem of having two identical pages indexed with differing URLs. Specifically, add the following:
Doing so will prevent Google from indexing ANY page with a question mark in the URL.