Show AI crawlers what you want them to see
Arnout says: “People really need to look at their technical setup.
With that, I mean the rendered version versus the raw HTML – especially with a lot of AI crawlers not rendering yet, and also because I've seen a lot of discrepancies in that area.”
Is the rendered version typically quite different for different search engines?
“As with a lot of things in SEO, it depends.
Sometimes, when websites are built using JavaScript frameworks, the content is actually different in the raw HTML versus the rendered HTML.
Titles might be different. There might not be schema markup, there might be different headings, etc., because those can be changed by the execution of JavaScript.
That can severely impact the discoverability of your page.”
What does this mean for the use of JavaScript over the next few years?
“Crawling a website using a rendered version takes a lot of energy, because you actually need to render the pages. Most crawlers would rather just scrape the page, get the raw HTML, and get all the elements.
Currently, for a lot of AI crawlers – OpenAI, Perplexity, etc – it's just too expensive for them to do it. Microsoft and Google are doing it. The thing is, you can properly implement this with pre-rendering solutions and hybrid solutions. There are loads of ways to work around it, so it’s not the death of JavaScript, but I feel that JavaScript has had a lot of technical difficulties that feel scary for most people in SEO.
Most people don't even know how to do this. It took me a while to figure it out. It's definitely not the death of JavaScript, because JavaScript has given us a lot of interactive elements and all kinds of things. However, it's something people should be aware of, especially with all these new crawlers popping up.
You want your content to be seen by AI crawlers as well as search engines.”
What does an SEO need to do in order to determine how their website is seen by AI crawlers?
“For one, there's a free extension in Chrome called View Rendered Source, which will show you the difference between the raw HTML and the rendered source view.
The other thing is, if you use a crawler, look at the difference between the raw HTML and the rendered HTML. I really enjoy using Sitebulb for that because it will literally show you which links were added using JavaScript, what piece of content was changed because of JavaScript, which images are being rendered through the execution of JavaScript, etc.”
How do you determine how AI search engines see things differently, and what do you do to ensure that they have a better idea of what's on your web pages?
“Basically, they only look at the raw HTML. What you see in the raw HTML is what they can see.
If your images aren't visible in there, then they won't see the images. If your page title isn't filled in or there are no headings in the raw HTML, then that's what they are seeing.
In an ideal situation, the raw HTML and the rendered HTML are basically the same. Then, your web page will be a lot faster because no JavaScript execution is needed to render the page. Is that always possible? No, but you should get as close as possible because it makes everything faster, and there won't be any indexation problems. It's way easier.
Back in the day, we would just look at the raw HTML, and that was it. Nowadays it's a lot more difficult.”
As an SEO, do you need to prioritise certain elements that should be incorporated, or do you prioritise certain pages?
“The basics are the most important. It's making sure that the core content of the page can be read. This means the headings, page title, images, etc. – because that's what you want in the index. If certain elements are not working, or a footer is not working, that's not the biggest issue.
You need to prioritise having all the elements in the raw HTML, and they should not change.
What I have seen a lot is that, when the page gets rendered, the elements are still the same, but they've been taken away and inserted again, which makes search engines think, ‘What's happening here? I thought I had the H1, and now the H1 is somewhere else.’ It's still the same H1, but it is confusing the crawler.”
If you have a JavaScript menu, do you need to prioritise the inclusion of HTML links to all the other pages in your site, or is an XML sitemap sufficient?
“An XML sitemap is always a good idea. The challenge with links in these mega menus is when you switch off JavaScript.
Try doing it. Switch off JavaScript in your browser (using a NoScript plugin or whatever) and then try browsing your own website. It might be very hard.
In an ideal situation, you want the internal linking to keep working, but the biggest priority is getting the content and the right markup – the heading, the page title, etc. – in your content in the index. That's the most important part.”
Does this impact brand visibility?
“Yes, but most search engines will first crawl and index the raw HTML, then render the page, and then compare it to the original. Then it will think, ‘Should I overwrite this?’
Say you used JavaScript to insert structured data. That might be an issue because there might be a delay in the structured data being rendered. For instance, a product might have been out of stock, but now it's in stock on your website. However, because the page hasn't been rendered, Google still think it's out of stock.
That's why those particular elements are really important to have in your raw HTML, not only in your rendered.”
How do you know which elements are likely to have the biggest impact on rankings?
“Again, it depends.
Say you have review stars. If you're using JavaScript to insert that part of the code, it might not appear for all the pages immediately. Then, that is an important element.
Say your headings are changing; there are no headings in the raw HTML, but there are headings in the rendered. That will impact the ranking of a new article. Eventually, it will fix itself once the rendered version gets indexed. Initially, however, you won't benefit from the work you did.”
Does this mean it is even more important that your CMS incorporates the non-rendered version as readable by modern search engines?
“Yes. I see a lot of headless CMSs popping up, like Storyblok, and they usually have a JavaScript-based app. If you don’t use a pre-rendering solution – a server version that will render the page and serve a fully rendered page to both search engines and users – it will impact everything.
Any CMS can be adjusted to be able to serve this. It's a little more work, but it can be done. Most of the traditional CMSs use less JavaScript, so the core elements will just be there.
You should just be aware, and I feel a lot of people are not aware. I see a lot of use cases for React-based applications or websites, but there are also a lot of cases where you shouldn't.
If you're making loads of changes, you don't need a developer for everything. You are better off going for a fairly standard CMS out of the box – whether it's WordPress, Drupal, or one of the builders like Wix or Duda, rather than building something with the React front end, because that will create these problems.
We should be aware that this is happening, and I see a lot of people who are a little scared of doing SEO this way.”
Do you implement changes on a test-and-learn basis to analyse the impact of what you're doing, and how do you demonstrate that impact to stakeholders?
“First, you need to understand what's happening, using tools like View Rendered Source.
I've had projects where it would also be dependent on the user agent, and what IP location you are in/what country. It's difficult.
Once you've understood what the problem is, you can build a case around it fairly easily. If the structured data for a review snippet is gone for some pages, and it's still there on other pages, you can show that the click-through rate is a lot higher for the pages with it than without it, so it's highly likely that it is a result of this issue.”
How much of an SEO’s time should be spent on this?
“It depends. If you are going to work on a React-based platform, you should spend a lot of time fixing and monitoring this. Sometimes it breaks and you get unforeseen errors, like the pre-rendering stops working. Then, you have a big problem. Most people aren't looking at that.
It depends on your platform. If you don't use any JavaScript in the front end, or hardly any, it's less of an issue.”
Arnout, what's the key takeaway from the tip you shared today?
“You should not only look at the source of a page but also look at the rendered version.
View Rendered Source (the extension by Jon Hogg) is awesome for that. SiteBulb also has a great comparison in their crawler.
More people should just check this. I see a lot of people not checking this and failing to solve some issues.”
Arnout Hellemans is a Freelance Tech SEO and Analytics Consultant. Find out more over at OnlineMarkethink.com.