I work on a site with ~30,000 products which we want to de-index as a great deal of the content on those products is duplicated on other sites.
The products are uploaded from feeds, and a number of other sites also receive the feeds, creating not only duplicated content but also causing us to be flagged as an affiliate site (we know this is a.bad.thing. and are now fixing this up!). Our plan is to significantly reduce the products we offer on the site, write up some great unique content for these and allow engines back in to crawl them.
- Problem one: The URLs for the products do not contain a file that I can block engines from crawling eg: we have /the-big-blue-widget.aspx instead of /products/the-big-blue-widget.aspx
- Problem two: The products need to be inaccessible to engines, but our customers still need to be able to access the pages if they want to see more information. They can add to their basket from the listing in the category page
My own solution that I am still working through is to 301 all the "/the-big-blue-widget.aspx" pages to "/products/the-big-blue-widget.aspx", get those pages indexed and then block the /products/ file in the robots.txt and submit a de-index request.
I'd really appreciate your thoughts on using a 410 response like this!