Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!


Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 




From the folks who brought you High Rankings!

- - - - -

Url Manipulation

This topic has been archived. This means that you cannot reply to this topic.
6 replies to this topic

#1 mati


    HR 1

  • Members
  • Pip
  • 2 posts

Posted 04 August 2003 - 12:51 AM

I need some help in understanding the use of "url manipulation". I'm not a web designer/SEO by trade, just own an online ecommerce website and also a newsletter sub of "High Rankings". Thanks Jill for putting this forum together. :D

Will try to make this brief and to the point-

Recently moved/re-designed my website to a dynamic type of web design/storefront solution. Since my site has been moved, I've started to notice within Google's SE results two different url addresses going to the same page for same the same keyword phrase. I spoke with my designer about and my site has a product list page with one url address and then each product page has a url address. She then explained to me that the technique being used (I guess it's a technique...please forgive my lack of proper wording) is "url manipulation" and referred me to this link- http://www.xde.net/x...eb/qx/index.htm. The product list page has the more friendly SE url addresses.


Could this be a problem? Any other feedback is also appreciated.

Thanks in advance!

#2 ukseo


    HR 1

  • Members
  • Pip
  • 1 posts

Posted 04 August 2003 - 03:15 AM

If you have two URL that are filled with content from one page you could be headng towords problems as you essentially haev duplicate content.
If I am correct you have dynamic site with a url like:


and you have used QXASP to rewrite that so that the url can be written as

sitename.com/product/2/ or similar

So as far as the serach engine thinsg there are two distint page BUT with exactly the same content on both. That is the problem. You need to get all references of the old ones out of the index, so you must not haev any links to them withing your site. Any that come in from the outside you will have to try an get changed. If you cant do that you will need to get a what is termed a 301 permanent direct to let the engines know that the url has changed for good.

It is possible on an apache server to do a blockwise 301 using a regular expressions to catch all variations of the urls.

Good luck

#3 Jill


    Recovering SEO

  • Admin
  • 33,244 posts

Posted 04 August 2003 - 07:00 AM

Most of the time, the search engines will sort that sort of thing out for themselves, and only show one of the pages of dupe content. It's very common for dynamically generated sites to show dupe content, which is why the search engines are hesitant to index these pages in the first place.

You should definitely do whatever you can to minimize the chance of the engines picking up the exact same content using two URLs.

Oh, and welcome, Mati! ;)


#4 JohnC


    HR 2

  • Banned
  • PipPip
  • 10 posts

Posted 04 August 2003 - 10:14 AM

I have looked into using this very software. I believe it uses technology called "server-side replace" and this is very different than a "server-side redirect".

Very basically, what it does is put a program or script at the 404 page. Then when the "SEO friendly" URL is requested from the server, the 404 script/program processes the non-existent URL and displays the correct page while still showing the non-existent url in the browser. Because of the way this works, you have to keep the current page as is. If you tried to do a 301 or something this would also be applied to the "new" URL.

I have not found a way to remove the duplicate URLs, I guess you could restructure the site and use a robot.txt file to keep the bots off of the real pages. Too much of a pain for me, so I am still looking. ;)

#5 mati


    HR 1

  • Members
  • Pip
  • 2 posts

Posted 04 August 2003 - 12:18 PM

Thanks everyone for your comments! Not sure what to do ?

My designer decided to use this techinque (url manipulation- http://www.xde.net/x...eb/qx/index.htm ) after she read the book Search Engine Positioning,by Fredrick W. Marckini . I havn't read this book myself. Has anyone else and if so, wouldn't the issue of two url addresses getting picked up by the SE's (especially Google) be addressed in this book?

Thanks all!

~ :cheers:

#6 andy


    HR 1

  • Members
  • Pip
  • 1 posts

Posted 13 August 2003 - 06:59 AM

If you use Apache then next time take a look at Rewrite rules.


Apache Rewrite Rules

#7 Ron Carnell

Ron Carnell

    HR 6

  • Moderator
  • 968 posts

Posted 13 August 2003 - 08:04 AM

Off the top of my head, there are three very different techniques used to incorporate SE friendly URLs into a dynamic site.

The first, and in my opinion the best, is to have the script actually generating the dynamic content both parse and create the SE friendly URLs. There are a few ways to accomplish this, the most common being to use the PATH_INFO environment variable instead of referencing the query string. If an application is designed this way, it's easy. Converting an existing script, though, can be a little more difficult.

The second way, mentioned by Andy, is to let the server rewrite the URLs on the fly. Again, there are a few ways to do this, with mod_rewrite being the most common. The danger here is the additional load put on the server for every page requested. For sites with little traffic that can be ignored, for busier sites it's still rarely an issue, but for large sites with lots of pages it can potentially bring a server to its knees if you get hit by an aggressive spider. The spider thinks its requesting static pages and expects them to be returned with the usual speed of a static page. This is a problem with ANY of the methods for SE friendly URLs, but can be exacerbated by throwing mod_rewrite into the mix. It's not a deal-buster, but should be a concern. Make sure your hardware is up to the task.

The third way, also mentioned above, is to have an intervening script create the page. The usual choice is to install a custom 404 page, which is almost always even more resource-hungry than mod_rewrite. Additionally, the script has to be very carefully written to insure the correct status code is returned to the requesting agent. Send back anything except a 200 status and you've probably defeated the whole scheme.

Properly implemented, NONE of these three techniques should result in duplicate pages. The whole idea of these techniques is to both display and process URLs that have no ? query string. If you have two identical pages in Google, one SE friendly and one with a question mark, something is being done wrong.

Fortunately, there's still a very simple answer, at least for Google.

Google recognizes certain extensions to the robots.txt file that can help eliminate the problem of having two identical pages indexed with differing URLs. Specifically, add the following:

User-agent: Googlebot
Disallow: /*?

Doing so will prevent Google from indexing ANY page with a question mark in the URL.

We are now a read-only forum.
No new posts or registrations allowed.