Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!


Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 




From the folks who brought you High Rankings!

- - - - -

Using A Headless Browser To Render Pages For Bots

  • Please log in to reply
4 replies to this topic

#1 qwerty


    HR 10

  • Moderator
  • 8,695 posts
  • Location:Somerville, MA

Posted 30 April 2014 - 07:59 AM

We have a mobile site that generates pages using a remote script that makes a bunch of API calls. Consequently, when Googlebot crawls these pages, it sees a blank. It can pick up the title, but nothing at all in the body.


One of our developers pointed me to a site that advertises a service for getting around this kind of issue. They use a headless browser to download the pages and fully render them, then they save the source code of the rendered pages on their server. When a bot requests the page from us, we use a proxy to feed them the pre-rendered version of the page.


Sound like cloaking to you? It does to me. After all, we'd be sniffing specifically for search engine spiders. According to the company that provides this service, it's not cloaking, because even though you're sniffing for the bot, you're presenting the bot with the same exact page a user sees. They go on to say that they're doing exactly what Google recommends on their page about making AJAX applications crawlable. That makes sense to me, but I'm not 100% convinced this won't get us in trouble, mostly because the technology is over my head, so I can't really be sure that what I'm reading is to be trusted.

#2 Ron Carnell

Ron Carnell

    HR 6

  • Moderator
  • 968 posts
  • Location:Michigan USA

Posted 30 April 2014 - 08:28 AM

I had the same concern, Bob, over a year ago when someone asked me to look at a site they were hosting with WIX. They do exactly what you've described. Apparently, to every page on their fairly substantial network? And they've been doing it for at least the last year.


Go to any page on a Wix server and view the source code. You'll see this little snippet in the HEAD:


This site's SEO content, such as meta tags and headers, is not here.
This is because search engines, like Google, actually crawl the site's homepage via http://www.example.com/?_escaped_fragment_=
Internal pages, like “REFERENCES”, also have their own special search engine versions, for example: http://www.example.com/?_escaped_fragment_=what-others-are-saying%2Fckwj
If you're looking for this site's SEO content, that's where you can view it.
Want more information about Ajax page crawling? Read Google's explanation here: http://bit.ly/ajaxcrawling


I've changed the domain name, but can PM you the URL if you'd like to see it, Bob.


Frankly, I've never been comfortable walking on the bleeding edge, but I can't argue with the results. The site in question has survived on Wix for nearly two years and ranks quite well for its targeted search terms. <shrugs shoulders> It seems to work?

#3 torka


    Vintage Babe

  • Moderator
  • 4,825 posts
  • Location:Triangle area, NC, USA, Earth (usually)

Posted 30 April 2014 - 08:35 AM

If I understand the documentation there, you're not actually sniffing for bots. You're simply serving your content using a very specifically formatted hashtagged URL, which Google then "knows" indicates the page contains AJAX content that would normally be uncrawlable. They then go looking for the straight-HTML version at another specifically-formatted version of the URL and index the content from there, only they index it under the first version of the URL (which, if I'm reading it correctly, is also the same URL that humans would visit if they come to your page).


They also give an alternative META element you can use in the event you don't want the hashtagged suffix on your URL (say, for your home page).


So, assuming that this company is telling the truth and what they do is actually in conformance with what Google says to do, I think you're OK.


I mean, since it was Google itself that published this "how to," I don't see why they would have a problem with somebody actually implementing it. :)


--Torka :oldfogey:

#4 qwerty


    HR 10

  • Moderator
  • 8,695 posts
  • Location:Somerville, MA

Posted 30 April 2014 - 09:06 AM

This isn't exactly what Wix sites do. I think the intent is similar, but as far as I can tell it's a very different method.


Still, it does look a lot like what Google is describing as their recommended method. Now I'm wondering if our Operations people would be happier with having us do all of this in-house rather than relying on a third party's servers to show the pre-rendered pages to bots.

#5 Michael Martinez

Michael Martinez

    HR 10

  • Active Members
  • PipPipPipPipPipPipPipPipPipPip
  • 5,325 posts
  • Location:Georgia

Posted 30 April 2014 - 09:24 AM

The intent behind cloaking is to deceive search engines. But cloaking itself is accomplished through technology that is used for other purposes and the search engines know that.

For example, many Websites include code intended only for use by Internet Explorer. Hypothetically that code could be used to alter what the IE user sees (in fact, that was once quite common on sites that "look best in Internet Explorer"). I remember landing on sites that actually FORCED me to use IE to see what their content was. None of them was ever penalized to the best of my recollection.

Another example is where sites detect platform and serve custom-formatted content, such as placeholder images instead of video. Geotargeted content does this a lot, such as sites like Hulu and Netflix. They are not penalized.

If the intention behind the proxied content is simply to show the search engines what the visitors are seeing then I wouldn't worry too much.

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

We are now a read-only forum.
No new posts or registrations allowed.