Solving SEO with Headless Chrome (Polymer Summit 2017)

If you manage to pick up on my accent in the last five words, I am indeed Australian and it’s honored to be followed up by Trey, my fellow Aussie, as well prior to joining this team. I’d worked on the beloved chrome dev tools, one of my smallest, but maybe my greatest contribution was adding the ability to rearrange tabs in dev tools, there’s probably the greatest five lines I’ve ever written.

I did work another five other features. So if you find me afterwards feel free to ask me about them, and I might share dev tools trickle to more recently, I’ve had the humbling experience of building web components at all and witnessing all the incredible components that all of you have built and published. For example, the one and only Pokemon selector and if you’re the person who says but there’s a Hanyu, only 151 pokemon in the original set well there’s even an option that lets you set that too, so all kudos to Sammy.

For this, it was, however, in the process of building web components at all, which brings us to what we’re here to talk about today. So, first I’m going to cover my story of how I came to encounter this SEO problem while building web components. Our dog will then look at how I used have less chrome to solve this before diving into all the details of how that actually works and how you can use it. So I’m going to take a step back for a moment and talk about what I learnt in the process of building web components.

A talk. The first thing I learned was how the platform supports encapsulation through the use of web components with this encapsulation comes with inherent code reuse, which leads to a specific architecture. I also learnt about progressive web apps and how they can provide us with fast engaging experiences. I learned how the platform provides api’s such as service workers, to help enable those experiences, as I learned how to compose web components, to build a progressive web.

App we’ve heard from Kevin yesterday about the purple pattern: push render precache lazy load as a method of optimizing delivery of this application to the user and one of the architectures which enables us to utilize. The purple panel is the app shell model. It provides us with instant, reliable performance by using an aggressively cached app shell. You can see that for all the requests which hit our server, we serve the entry point file which we serve regardless of the route.

The client then requests the app shell, which is similar, but because the same URL across the application, we can combine that with a serviceworker to achieve near-instant loading on repeated visits. The shell is then responsible for looking at the actual route that was requested and then request. The necessary resources to render that route. So this point I’d learned how to build a progressive web app using client-side technologies like web components in polymer and how to use patterns such as the purple pan to deliver this application quickly to the user.

Then there’s the elephant in the room SEO for some of these BOTS they’re, basically just running curl with that URL and stop right there, no rendering no JavaScript. So what are we left with with this PWA that we built using the app shell model? We’re left with just your entry point file, which has no information in it at all, and in fact it’s the same generic entry point file that you serve across your entire application.

So this is particularly problematic for web components, which require JavaScript to be executed for them to be useful. This issue applies to all search engine indexes that don’t render JavaScript, but it also applies to the plethora of link rendering BOTS out there. There’s a social BOTS like Facebook and to but don’t forget the enormous number of link renting BOTS such as slack hangouts Gmail, you name it.

So what is it about the app shell model that I’d really like to keep well? For me, this approach pushes our application complexity out to the client. You can see that the server has no understanding of routes. It just serves the entry point file and he has no real understanding of what the user is actually trying to achieve. This allows our server to be significantly decoupled from the front end application, since it now only needs to expose a simple API to read and manipulate data.

The client that we pushed out to the application that we pushed out to the client is then responsible for servicing. This data to the user and mediating user user interactions to manipulate this data, so I asked: can we keep this simple architecture that we know and we love and also solve this SEO use case with zero performance cost? So then we thought what, if we just use headless chrome to render on our behalf, so here’s a breakdown of how that would work.

We have our regular users who are making a request and they would like a cat picture because who wouldn’t and as part of this approach, we ask our robot and to answer this, we look at the user agent string and check if it’s an own bot that Doesn’t render in this case the user can render so we serve the page as we normally would. The server responds with the fetch cat picture function and then the client can go and execute that function to get the rendered result by the way.

This is one of my kittens, which I fostered recently, which is super adorable. Now, when we encounter a boss, we can look at a user agent string and determine that they don’t render, and instead of serving that fetch cat picture function, we fire for a quest to headless Chrome to render this page on our behalf, and then we send the Serialized rendered response back to the bar, so they can see the full contents of the page.

So I built a proof-of-concept of this approach for web components rock and it worked. I wrote a medium post about it, and people really interested in this approach and want to see more of it. So, based on this response, I eventually decided that instead of my hacky solution that I would build it properly but then came the most challenging part of any project and I know you’ve all experienced it as well naming.

So I asked on our team chat for some suggestions and I got a tongue, so these are some of our top ones. There’s some great ones in their power renders use the platform as a renderer. However, today I’m very pleased to introduce render Tron. Let me render that, for you. Brenda Tron is a doc arised, headless, chrome, rendering solution. So that’s a mouthful, so let’s break it down. First off what is docker and why did I use it? Well, no one knows what it means, but it’s provocative in all seriousness.

Docker containers allow you to create lightweight images and standalone executable packages which isolate software from its surrounding environment in render Tron. We have headless chrome packaged up in this container so that you can easily clone and deploy this to wherever you like. So what about headless chrome? It was introduced in chrome, 59 for Linux and Mac chrome 60 for Windows, and it allows chrome to be run in environments which don’t have a UI interface such as a server.

This means that you can now use Chrome as part of any any part of your tool chain. You can use it for automated testing. You can use it for measuring the performance of your application, generating PDFs amongst many other things. Headless chrome itself exposes a really basic JSON API for managing tabs with most of the power coming from the dev tools protocol. All of dev tools is built on top of this protocol.

So it’s a pretty powerful API, and one of the key reasons that headless chrome is great. Is that now we’re bringing the latest and greatest from chrome to ensure that all the latest web platform features are supported with render Truong? This means that net your SEO can now be a first-class environment which is no different, the rest of your users. So just a quick shout out. This all sounds really interesting to you and you would like to include headless chrome in some other way in your to a chain.

There’s a brand new library, node library that was published just last week, that exposes a high level API to control chrome, while also bundling all of chrome inside that node package. So you can check it out on github at google chrome, slash puppeteer, so we’ve looked at the high level of how headless chrome can fit into your application to fulfill your SEO needs now it’s time to dive to how it works.

But I’ve been talking a lot. So, who wants to see render tron in action alright. So this is the hacker news PWA created by some of my awesome colleagues and it’s built using polymer and web components. It loads really fast and all-round performs pretty well. We can see that there’s a separate network requests which loads the main content that we see and we can guess that it’s affected by this SEO problem, since it uses web components which require JavaScript and it pulls the in data asynchronously.

So one quick way to verify this is by disabling JavaScript and refreshing the page, and once we do that, we can see that we still get the app header, since that was in the initial request. But we lose the main content of the page which isn’t good. So we jump over to render Truong the headless chrome service that is meant to render and serialize this for you. So I wrote this UI as a quick way to put in a URL and test the output from render Tron so first off.

What are we hoping to see because these bots only perform one request? We want to see that whole page come back in that one network request. We also want to see that it doesn’t need any JavaScript to do this. So take a look, I’m going to put in the hacker news URL and tell render Tron to render and serialize this and that using web components, and it renders correctly I’m going to disable JavaScript and verify that it still works.

So you can see it’s still there and it all comes back in that single network requests render tron automatically detects. When your PWA has completed loading. It looks at the page load event and ensures that it has fired. But we know that’s a really poor indication of when the page is actually completed. Loading, so Rena Tron also ensures that any async work has been completed and it also looks at your network requests to make sure they’re finished as well.

In total, you have a ten-second rendering budget. This doesn’t mean that it waits 10 seconds, though it’ll finish as soon as your rendering is complete. If this is insufficient for you, you can also fire a custom event which signals to rent Ron that your PWA has completed. Loading serializing web components is tricky because of shadow Dom which it straps away part of the dom tree so to keep things simple.

Rennet ron uses shady Dom, which polyfills shadow Dom this allows render tron to effectively serialize the dom tree so that it can be preserved. In the output, so let’s take a look at the news PWA, which you’ve all seen – and it’s also built by some of my other colleagues and we’ll plug that in to render tron will then ask render tron to render this as well and that I’m also using Web components, and then we have it.

So what do you need to do to enable this behavior with polymer 1? This is super easy and render tron doesn’t actually need to do anything simply append D’Amico’s shady to the URLs that you pass to render Tron and polymer 1 will ensure that shady Dom is used with polymer 2 and with web web components. V1. It’s recommended you use web components, loader jeaious, which pulls in all the right polyfills on different browsers.

You then set a flag to render tron tell it that telling it that you’re using web components, and it will ensure that the necessary polyfills that it needs for serialization get enabled so another feature of render Tron is that it lets you set HTTP status codes, these Status codes are used by indexes as important signals, for example, if he comes across a 404, it’s not going to link to that page, because that will be a really poor search result.

Now server, though, it’s still returning that entry point bar with a status code of 200. Okay, so it looks like every URL exists. Rena-Chan lets you configure that status code from within your PW, a which understands when a page is invalid, simply add meta tags. Dynamically is fine to signal to render on what the status code should be render. Tron will then pick these up and return that status code to the bot, so this approach isn’t specific to polymer or even web components, let’s plug in Fahnestock google.

Com and sees what happens when we serialize it. So that looks pretty good. Who can guess what javascript library was used to build? Google fonts angular render Trond works with any and all client-side technologies that work in Chrome and whose Dom tree can be serialized. The render tron endpoint also features screenshot capabilities, so that you can check that headless, chrome and the load detecting function are performing as you expect.

Unfortunately, this service is not fast for each URL that we render we spin up headless Chrome to render that entire page, so performance is strictly tied to the performance of your PWA. Renat Ron does, however, implement a perfect cache. This means that if we have rendered the same page within a certain cache freshness threshold will serve the cached response instead of rear-ending it again. So how can you get your hands on this today and how do you use it? Well, first, you need to deploy the random tron service to an end.

You’ll need to clone the github repo at Google, Chrome, slash, magnetron, and it’s built primarily for Google cloud. So it’s easy to deploy there. But if you remember this is a darker container, so you can deploy this to anywhere, which supports a docker image. So to make things simple for you to test our. We have the demo service endpoint, which you can hit at render Tron appspot.Com and that’s the one with the UI that we saw earlier.

It is not intended to be used as a production endpoint. However, you are welcome to use it, but we make no guarantees on uptime. Having this as a ready to use service is something we might consider based on the interest receive. So, just in case you’re wondering my boss’s twitter handle is at met, Matt s McNulty, just in case. You want to tell him how awesome I am so once we have that endpoint up you’re going to need to install some middleware in your application to do the user agent splitting that I was talking about earlier.

So this middleware needs to look at the user agent figure out whether or not they can render and if not proxy, the requests through the render tron endpoint, if you’re using purple server, which is a node server designed to serve production applications using purple. You simply need to specify the bot proxy option and provide it with your rennet on endpoint, if you’re using Express, there’s a middleware that you can include directly by saying app, don’t use render on top make middleware with the proxy endpoint and whether or not you’re using Web components, if you’re not using either of these check the docs for a list of community maintained bit aware, there’s a firebase function there, as well as a list of existing middleware that render China is compatible with.

If it’s not listed, it’s also fairly simple to roll. Your own middleware by simply proxying based on the user agent string, and that’s it, that’s all the changes you need to make to use, render tron today and all these bots can now be happy. Brenda Tron is available to use today compatible with any client-side technologies, including both polymer 1 and polymer 2. Thank you.

Share this:

By Jimmy Dagger

Leave a comment Cancel reply