Semantic keyword clustering is an indispensable part of keyword research but often a time consuming and complicated task. In a study conducted by search marketer Paul Shapiro, an average keyword analysis takes 5 to 28 hours.
Automating this process surely comes in handy! Fortunately, developments in artificial intelligence are accompanied by the rise of tools and techniques that help you automate keyword clustering.
But how to automate without racking one’s brain over complicated code, like in Python?
Equally relevant, why do we actually cluster keywords? How to ensure we group keywords properly? And how can KeyWI help you towards formulating data-driven content actions in just a few minutes?
In this article, I give the answers to these questions and explain why the quality of automatic semantic clustering is in most cases better than manual clustering.
Navigate to:
In 2013, Google announced Hummingbird; the code name for a new algorithm update. It includes several components, one of which is RankBrain, which was announced in 2015.
Hummingbird is able to understand the semantics of a user's search query. It considers the whole search query - a word or whole sentence - rather than individual words.
In 2015, Google announced RankBrain, a machine learning technology and an extension of Hummingbird, which helps interpret search queries even better. Rankbrain is able to see patterns between seemingly unrelated search queries and learns how they are similar.
So RankBrain works with Hummingbird to provide better search results for user searches. Only Rankbrain goes beyond just semantic search. Based on what it learns, the self-learning algorithm is able to apply this 'learning experience' to future search queries. These can be similar search queries but also unfamiliar combinations of searches.
In 2019, Google introduced a new algorithm update called BERT. This model uses natural language processing (NLP) and sentiment analysis, among other things, to understand each word in relation to all other words in a sentence. But recently Google announced that they are developing a new technology that is 1000x more powerful than BERT: MUM.
“MUM is a technique that enables transferring knowledge across languages. MUM not only understands language, but also generates it. It’s trained across 75 different languages and many different tasks at once, allowing it to develop a more comprehensive understanding of information and world knowledge than previous models.” (Nayak, Google, 2021)
MUM is also multimodal. This means that MUM can understand information of different formats, such as web pages, photos, videos and more, simultaneously.
“.. MUM is multimodal, so it understands information across text and images and, in the future, can expand to more modalities like video and audio.” (Nayak, Google, 2021)
According to Google, search engines are not yet able to directly solve very complex search queries. Often, the models do not 'understand' the context or the latent needs. It requires users to perform multiple searches before they find their answer.
“People issue eight queries on average for complex tasks ....” (Nayak, Google, 2021)
With MUM, Google comes closer to providing instant answers to complex questions. Think for example of the "next query you're going to type in". Google would be able to display or suggest the answer to your third question already during your first search. Latent needs become faster and more apparent.
It’s thus essential for Google's models to understand someone's intent and what intent and informational needs are hidden behind particular search queries.
Two (or more) seemingly unrelated search queries can address the same informational needs and intent of the user.
Let’s take the following example to illustrate.
Search query 1 & 2
“Arabica coffee beans”
“Robusta coffee beans”
At first glance, it seems like the keywords have the following intents and informational needs:
If you would manually group these keywords purely based on syntax or the underlying meaning of the search terms, an easily available cluster name could be "types of coffee beans". There is a semantic relationship, but is this information sufficient to formulate actions or to make logical inferences as an SEO marketer?
When performing a search in Google’s search engine, it’s clear that 2 out of the top 3 results in Google are dominated by blog articles addressing the (quality) differences between Arabica and Robusta coffee beans:
Similar search results pop up with ‘Arabica coffee beans’.
The semantic link ‘type of coffee beans’ is obvious at first glance, but without further analysis one might have overlooked the fact that the user intent and informational need are virtually the same for both search queries.
This is the power of RankBrain.
Ads indicate commercial and transactional intent are present too. Except for one URL, the organic search results do not include any URL that respond to these intents with, for example, an e-commerce landing page featuring an overview of Robusta (or Arabica) coffee beans. Apparently, the informational intent and need is dominating in these search queries.
The above insights could only have been collected by analysing the SERP separately for each keyword.
There is no getting into manually carrying out an analysis for a keyword research of 5,000 or more search terms.
As an SEO consultant, your task is to formulate the right actions based on insights derived from the keyword analysis. Prior to content creation you first want to provide well-founded answers to the following questions:
And ideally also
Before you can formulate the right actions, you first need to know which keywords are semantically related and which keywords meet the same intent and informational needs.
Keywords clustering means grouping keywords that are semantically related and address the same search intent.
How does this work?
In 2017, I wrote an article about 'whisky for beginners' for a Dutch retailer Gall & Gall. At the time, I was a novice whisky drinker and went to consult myself to gain insight into what the contents for the article would be. I asked myself the previously mentioned questions:
Below are the top 18 current rankings of the article without sitelink rankings, 4 years later.
First of all, the primary keyword 'whisky for beginners' is mentioned 0 times in the article. Is this a bad thing? Not necessarily. I forgot to put keywords in the text [1] at the time. Instead, I focused on the questions above.
The article furthermore addresses the following topics:
It’s largely [2] because the semantic relationship and search intent are very similar for all search queries. In addition, the overarching theme is 'whisky for beginners'. Beginners have specific questions or needs. For instance, some questions will never be asked by an experienced whisky drinker.
A beginner whisky drinker searches for sweet or smooth whiskies because those are the entry level whiskies. The predominantly easy to drink whiskies. Learning to drink whisky is something you only do as a beginner. And the underlying informational need is knowing how to start and with what whiskies to start. 'Whisky flavours' or ‘tasty whisky’' are typical search queries used by someone who has little knowledge of whisky.
You could say that Google Search, with the help of RankBrain’s self-learning algorithm and other models, was sophisticated enough back in 2017 to determine the semantic relationship between these search queries. Hence the article ranks on keywords that do not necessarily have to appear in the article as the underlying user intent and informational needs are met.
Correct clustering of keywords thus creates a strong start for your content strategy. It yields better insight into how you can organise pages and rank for particular clusters of keywords.
Other benefits of semantic keyword clustering are:
Keyword clustering starts with keyword research. Collect as many relevant keywords as possible, including all variations, long-tail keywords and subtopics.
When performing keyword research, make sure ‘relevance’ and ‘search intent’ are top of mind. Marketing budgets are spent wisely on content that brings relevant visitors to the site. It sure can be a tough job though to determine what exactly makes a search query relevant. I mean, ‘relevance’ is a commonly used buzzword.
The following factors will help to narrow down the scope and improve accuracy of keyword research:
Depending on these factors, your site may or may not be able in the first place to rank for certain search queries.
The type of page and content formats also play a part in understanding and addressing user intent. Some search queries can only be served by particular types of pages and content modalities, depending on how the user wants to consume content.
An example to illustrate.
Suppose you are hired as SEO marketer by the imaginary bicycle brand "FastBikey". The bicycle brand hosts a platform with integrated webshop featuring branded products. In the keyword research you came across the keyword 'buy a bike’.
Is this keyword relevant?
No.
It’s clear that established parties dominate the top search results:
Users searching for "buy a bike" are considering buying a bicycle but don’t have a particular brand preference yet. To fit that need, it would make sense to see an overview featuring images of different bicycles with copy content that meets the explicit and latent informational needs of users.
FastBikey as an individual bicycle brand cannot meet these informational needs or address user intent. And building a relevant page is simply beyond FastBikey’s scope.
Understanding which keywords are relevant is essential to creating the right semantic content clusters and driving relevant traffic to your site.
KeyWI is an ai driven keyword clustering tool that semantically analyses and groups a set of keywords in just a few minutes.
From a structured keyword list to a complete keyword clustering in just 2 easy steps. I explain the steps by using examples including immediate insights and concrete actions.
The test set is a dutch keyword list of 469 keywords on the topic ‘artificial grass’.
First upload the csv of the keyword set.
Then configure the geolocation settings to your preference. I chose the following setup:
You can specify a domain at 'Domain rank’. For example, you can choose a client’s domain name [3]. KeyWI collects ranking data of all keywords in the set for the specified domain. This allows you to analyse the site’s visibility per cluster or even subcluster.
Ready? Press 'Cluster'.
The clustering of 469 keywords takes about 2 minutes.
After 2 minutes, KeyWI generates the following visualisation.
The first layer contains the primary content clusters. The second layer of clusters contain the subclusters. KeyWI has assigned each content cluster (=circle) a dominant user intent. For example, a blue cluster is predominantly informative.
Of course, a (sub)cluster or individual keyword does not have to correspond with 1 Intent. In the example of the keyword 'buying a bicycle', the keyword has a mix of informational, commercial and transactional intent.
Select one of the clusters.
In the example below [4] you see a predominantly informative content cluster with commercial as a secondary intent. The topic is 'laying (or installing) artificial grass' [5].
What can be observed:
If you click on the second layer on the second bubble from the right, you go one layer deeper. Among the bubbles is the subcluster "laying artificial grass - generic".
The keywords are variations of saying ‘laying artificial grass’. Virtually all have both informative and commercial intent. After all, it’s impossible to derive from the keywords whether the search queries are performed by people who search for a service that can install artificial grass for them or who prefer to install artificial grass themselves. Google search results confirm this.
The ads shown meet the commercial intent and the need of individuals or companies looking for a service company that installs artificial grass.
Also, the top three [6] organic search results are dominated by blog articles that meet the informational intent and provide searchers with tips on how to install artificial grass themselves [7]:
Another subcluster in the same primary cluster ‘laying artificial grass’ is ‘laying artificial grass garden'.
This example is specifically about installing artificial grass in the garden. This ought to give a better indication that it concerns individuals who want to install artificial grass in the garden themselves.
However, Google search results return more or less the same results as the previous subcluster. Namely, blog articles about installing artificial grass yourself.
Insights
Actions for organic content [8]
In a similar vein, you can easily analyse other content clusters.
Would you like to know the specified domain’s performance in Google? KeyWI also visualises per sub(cluster) the average ranking.
Is a cluster grey? Then the entire (sub)cluster does not rank.
The above visualisation provides you with direct insight into possible opportunities for content creation and optimisation.
But also high-performing pages or search terms are worth evaluating.
The following image shows an example of the green subcluster 'buying online' of the parent subcluster 'buying' of the commercially dominant cluster 'artificial grass generic / buying'.
Something odd happened: to the first three terms, which contain ‘kunstgras online’ [9], KeyWI has attributed only navigational intent. You would expect commercial or transactional intent as the three search terms are about buying or ordering artificial grass online.
Another subcluster, containing only the related generic keywords 'buying artificial grass' and 'ordering artificial grass' has an average ranking of 1. The keywords have both commercial and transactional intent.
Both subclusters’ keywords target the same page.
In summary, the following can be observed:
An examination of the search results in Google shows a case of Exact-Match-Domain (EMD) [10]. In these cases, the match of the domain name can induce users to click on that search result. After all, users are specifically looking for artificial grass they can order online.
But is that the reason it ranks first? And does this also mean that users were initially searching specifically for the company Kunstgras Online? Probably not. Or maybe some. And how do Google’s algorithms interpret such search behaviour?
Logical questions that are beyond the scope of this article.
More important is to deduce that it is more difficult to rank first for search terms containing 'kunstgras online'.
However, it is possible to carry out on-page optimisations that will boost rankings for these search queries. For example, Kunstgras Direct can optimise the meta title, ‘Always greener than your neighbours lawn!’, which isn’t really helpful.
Also, there is no mention of ordering artificial grass online on the landing page.
KeyWI generates intuitive, actionable insights for content optimisation and content creation. It is fast, easy to use and above all data-driven.
The automatic semantic keyword clustering with KeyWI not only saves time, but also improves the quality of semantic clustering.
Other advantages of automatic keyword clustering with KeyWI:
Would you like to know more about the tool or test it yourself? Try our keyword clustering tool for free. Discover whether it fits your marketing needs. We are convinced it does. Want to know more about keyword clustering, SEO related topics or automation of SEO insights? Please contact Bartjan.
1. In 2017 this method was dominantly used in SEO.
2. Other ranking factors such as URL or domain authority, internal links and backlinks contribute to the rankings too.
3. The example of kunstgrasdirect.nl is purely illustrative. Bartjan or KeyWI are not associated with kunstgrasdirect.nl
4. All print screens are in Dutch.
5. In dutch: ‘kunstgras leggen’
6. The first result is a featured snippet. The #2 above takes the original organic #1 position in Google.
7. ‘Zelf kunstgras leggen’ is dutch for ‘laying artificial grass yourself’
8. The automatic generation of actions is a feature currently in development.
9. Kunstgras online is dutch for ‘artificial grass online’
10. In 2012, Matt Cutts announced an algorithm change designed to reduce the number of low quality EMDs in search results. EMDs can still have a positive effect as long as sites have authority and create quality content.