Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!


Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 




From the folks who brought you High Rankings!


Loving My Job Today

  • Please log in to reply
5 replies to this topic

#1 qwerty


    HR 10

  • Moderator
  • 8,695 posts
  • Location:Somerville, MA

Posted 12 March 2013 - 01:01 PM

You know what I love about my job? Does BBCode include a rant tag?


Let's say, just hypothetically, that you're the new SEO Manager at a company that hasn't bothered with real SEO in many years. Sure, they outsourced to a few consultants and agencies over the years, and that led to them creating labyrinths of spammy pages, hidden text, crazy spammy CSS classes with names like "SEO" so you know what their excuse was for doing what they did, etc. etc., but for the most part SEO (even terrible SEO) hasn't even been a consideration most of the time.


Consequently, you find that you've got sites with a lot of pages with duplicate titles. A lot of them. Enough that exporting a list of them from Webmaster Tools isn't sufficient, because it will only give you "30 of 50" or "30 of 205." So what do you do? You go to Google and run a search like [intitle:"the phrase in all those titles" site:www.domain.com]. And you get 760 results. And that's not 760 results of pages with titles of "the phrase in all those titles," because Google has been kind and considerate enough to "fix" those titles when they appear in the SERP, so they're mostly something that's actually relevant to the content on the page, followed by "the phrase in all those titles."


So while you know that plenty of these pages actually have a title that consists of "the phrase in all those titles" and nothing else, some of them are bound to be pages with titles that contain "the phrase in all those titles" along with something else, but you have no way of identifying which of them are which without clicking through to each... and... every... one... of... them.



#2 torka


    Vintage Babe

  • Moderator
  • 4,825 posts
  • Location:Triangle area, NC, USA, Earth (usually)

Posted 12 March 2013 - 01:21 PM

Oh, come on, Bob -- suck it up. You're only talking about 760 pages, after all. :giggle:


Here, have one or two of these. I guarantee the page surfing will be much more enjoyable: :stout: :bubbly: :ale:


And some snacks to help you keep your strength up: :whopper:


Now, back to clicking! :whip:


--Torka :propeller:

#3 Mikl


    HR 5

  • Active Members
  • PipPipPipPipPip
  • 345 posts
  • Location:Edinburgh, Scotland

Posted 13 March 2013 - 09:19 AM

Do these pages all exist as physical files somewhere (as opposed to them being generated on demand)?


If so, then it would surely be a trivial task for a programmer to write a quick bit of code to extract the title tags from each of the files, and then write them to a text file, ready for you to open in, say, Excel.


If you don't have access to programming skills, there must be a regular expression tool you could use to do the job (grep is one that comes to mind).



#4 qwerty


    HR 10

  • Moderator
  • 8,695 posts
  • Location:Somerville, MA

Posted 13 March 2013 - 09:56 AM

Based on the URLs, I'm thinking most of these pages are dynamically generated. For what it's worth, that probably means it will be easy to fix once I've got a list of them all -- just have the developers write a rule that creates a title and fill in a blank based on an ID lookup.


I'll look into grep. That sounds like fun too.

#5 chrishirst


    A not so moderate moderator.

  • Moderator
  • 7,718 posts
  • Location:Blackpool UK

Posted 16 March 2013 - 01:41 PM

Why not run Xenu's Link Sleuth on the site(s) or HTTRACK  to build a site mirror.

#6 qwerty


    HR 10

  • Moderator
  • 8,695 posts
  • Location:Somerville, MA

Posted 16 March 2013 - 02:39 PM

It's a complicated situation. There are pages that no longer exist, but have been 302 redirected, so they're still indexed. There are pages that are only accessible when the user is logged in, so search engines don't see them at all. I've crawled the site with both Xenu and Screaming Frog, and I've found things that way, but not everything.

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

We are now a read-only forum.
No new posts or registrations allowed.