Analyzing the SERPs Using Natural Language Processing [Experiment]

Table of Contents

As time goes on, the search engines we use each and every day, and for us SEOs optimize for, continues to evolve. In October 2019, Google started rolling out the BERT algorithm to improve understanding of natural language text.

“Bert is a natural language processing pre-training approach that can be used on a large body of text. It handles tasks such as entity recognition, part of speech tagging, and question-answering among other natural language processes. Bert helps Google understand natural language text from the Web.

Google has open sourced this technology, and others have created variations of BERT.”

Bill Slawski describing BERT

Gone are the days where keywords are stuffed haphazardly throughout posts, gaming the system with so called “quality content”.

Search engines are getting smarter and creating content that provides value and showcases expertise in topics is a must to rank competitively.

With this increasing advanced of natural language understanding, how exactly is this reflected in the SERPs?

One of the questions we had was, “Does Google reward tightly niched websites that stick to a single category?”

What is Natural Language Processing (NLP)?

Natural language processing is when machines gain the ability to read and understand human language through artificial intelligence.

How We Set Up the NLP Experiment

We wanted to focus in for this experiment to make sure our analysis wasn’t spread too thin. With this we focused on eCommerce, and “Best” keywords.

The reason for choosing “Best” keywords is because we’ve noticed a number of niche affiliate websites going this route.

Are niche websites able to outrank with content powerhouses like NYMag that exist?

Null Hypothesis

Websites that focus on a niche tightly are rewarded in the SERPs compared to broad websites.

Alternative Hypothesis

There is no relationship between having a niche website and being rewarded in the SERPs.

Using Google Cloud Natural Language API – Content Classification

  • Grabbed 105 keywords with the word “Best” and search volume greater than 1,000.
  • Pulled the top three ranking URLs for 105 keywords (total search volume equaling 453,000)
  • We used python to run the Google cloud natural language API content classification for each of the URLs.

Note: Some URLs were lost along the way due to crawling limitations and API limitations.

How does the Google Cloud Natural Language Processing Content Classification work?

The Content Classification takes documents and returns a list of classifications or categories based on the analysis of the content. With these categories comes a confidence score out of 1.  Example:

https://www.cnet.com/health/best-blood-pressure-monitor//Health/Health Conditions/Heart & Hypertension0.94

Documentation here: 

Initial Thoughts

There are factors like backlinks, mobile friendliness, technical SEO baseline, and much more that have an impact on ranking. The goal of this experiment was to see if having a better aligned site in terms of content will give an advantage in the SERPs.

Experiment Findings

Powerhouse content publishers own the top three

When looking at the domains that showed the most in our study, we noticed a large number of big name content publishers owning the SERPs. Coming into this, we expected a higher number of niche smaller websites to show up. But with larger content teams, there is tough competition these days for ranking for high volume keywords.

Powerhouse content publishers included sites like:

  • TechRadar
  • CNET
  • Good Housekeeping
  • NYTimes

79% of the domains in our study were powerhouse content publishers.

Most of the top three ranked pages have the same specific content classification

For pages ranking in the top three for each keyword, there was a high likelihood that the content category spit out by Google’s NLP API matched entirely.

88% of keywords had the same specific content classification for at least two of the top three ranked pages.

Almost all of the top three ranked pages have the same broad content classification

Keeping the top finding in mind, we took a look at the broad content classifications to see how diversified that was across search results. There was a noticeable increase in matches for content category when looking at the broad categories (one layer deep; e.g., “/Computers & Electronics/”)

95% of keywords had the same top content classification for the top three ranked pages.

This is interesting because it could be used as a tool to better understand how your writing compares to the competition. Sometimes we can get out of hand and go on tangents when we write, so using NLP content classification can show you if your content matches up category wise.

There is a strong content library focused on each category/topic

Because there were a large amount of content powerhouses owning the SERPs, we decided to take a look at the different sections of their sites to see how they are structuring content and why they may have a noticeable advantage.

We would run this through the NLP API, but it would be rather costly and time consuming to grab all the URLs of 300+ domains then analyze the text of all the pages. Maybe in the future!

For a website like The Spruce, you’ll see they have a section dedicated to home improvement reviews that houses roughly 70 reviews with category links pointing to more related reviews in different subtopics. 

It’s practically a home improvement review site on its own if you ignore the other parts of their site and start/navigate from there.

You’ll see similar things on another large website like the Wirecutter (owned by NYTimes). If you click into their headphones subcategory (electronics is the main category), you’ll see 24 hyper-focused posts on that category. 

Here is another example with a smaller site that has this focus on the homepage since they only create content about treadmills.

With this, it’s safe to say that site structure and usage of the hub and spoke model are the norm for top ranking websites. You could be a smaller website that has one hub (single product category focus), or a large online publication that structures it out with multiple hubs (various product categories). Having a strong library of content that increases the authority of your website on the topic will prove worthwhile.

I do want to note that we shouldn’t look over the quality of content that these publishers put out. The Wirecutter puts out some of the best reviews on the market, with hands on testing and frequent updates. But sometimes it feels like we’ve created great content and aren’t seeing the same returns. And as we know, there are a million and one factors that contribute to the results we see on Google today.

URL structure wasn’t too clear across the board

Also interesting is that you don’t see these large publishers using a nested URL structure for the product review pages, which may play down the impact of URL structure on SEO. They do use it for categories though (e.g, https://www.nytimes.com/wirecutter/electronics/headphones/)

The answer appears to be deeper than the hypothesis laid out

Our thinking was that we would see more leaders in this space that had a tightly niched website, like www.treadmillreviews.net, but that wasn’t the case.

As we all know, there are so many factors that impact rankings in Google so looking at it from this narrow of a lens won’t provide all the answers. Although, it will provide patterns that could be used to your advantage which we’ll cover below.

With the large online publications leading the charge in this study, we’d have to reject the null hypothesis. There isn’t much evidence showing that niche websites can compete for “Best” keywords. 

This study did take a look at large search volume keywords which naturally have high difficulty/competiton. So it’s possible that focusing on keywords below a certain SV threshold or different keywords will show different results. Maybe there is user behavior that Google is following that shows people enjoy reading about the best products from large reputable publishers (something, something, EAT perhaps?)

SEO Frameworks to Improve your Rankings

After analyzing the SERPs using NLP, we spotted a few patterns that appear to have a positive impact on the rankings of the pages.

Hub and Spoke Model or Topic Clusters

A hub and spoke model or topic cluster basically takes a library of content and organizes it so it’s easy to understand for users and search engines. 

A hub could be a product like headphones and the spokes would be reviews on the best wireless headphones, wired headphones, running headphones, and similar reviews.

By creating this type of format for your content to live on, internal links will fall into place, leading back to hub pages. With this, it keeps users engaged with your content, because the likelihood of them staying is increased. People are able to peruse through content that relates to the pieces that they initially landed on.

You can learn more about both of these concepts from Animalz and HubSpot who helped popularized both of these:

https://www.animalz.co/blog/seo-for-content-marketing/

https://blog.hubspot.com/marketing/topic-clusters-seo

Internal Linking

Something we noticed on all of these pages was that there was some sort of internal linking to connect related posts/reviews. This was done through various methods like text links and related review sections in the sidebar or above the footer.

Internal links help users and search engines navigate through your pages. They also give context to what your pages are about and define your site structure.

Learn more about internal linking best practices.

Update: Google introduced a “Product Reviews Update”

Google announced a new algorithm update on April 8, 2021 (after we pulled all the data and analyzed this) called the “Product Reviews Update”. The update aims to give a boost to reviews that are in-depth vs those that are thin. 

From our initial thoughts, we’re seeing roughly the same results in the SERPs with slight fluctuation. We believe that there wasn’t much shift in the top spots and in the keywords we are paying attention to, because it is owned by strong reviews like the Wirecutter.

Learn more about it here:

https://developers.google.com/search/blog/2021/04/product-reviews-update?hl=en

Key Takeaways

The “big boys” in the market seem to be winning for highly searched and competitive keywords based on our analysis. Don’t let this discourage your efforts though. With shifts in the way search engines reward sites, focusing on creating the absolute best content you can serve will prove advantageous in the long run.

  • Large online publications own the SERPs for “Best” keywords.
  • The top ranking pages normally have almost identical content classifications.
  • Using an NLP content classification tool to understand the categorization of your content can be beneficial.
  • The hub and spoke model or topic clusters is a must for your content strategy.
  • Internal links are important and shouldn’t be overlooked.
  • Google introduced the “Product Reviews Update” that rewards in-depth product reviews.
Share on facebook
Share on twitter
Share on linkedin
Share on email

Related Posts

Subscribe to our Knowledge Center