Semantic Search – the Wind of Change

by Malvina Vega

What is Semantic Search?

Underneath the raw technical term lies an innocent desire, old as humanity itself. 

Humans have always tried to look past appearances and get to the deeper meaning of what surrounds them. 

On occasion, this has led us to profound realizations. At other times, we’ve managed to fail spectacularly.

Now we have the technology to supercharge and deepen our quest for meaning. 

Enter semantic search.

The Magic of Words and Semantic Search

(Giphy)

Semantics is the fascinating side of linguistics, ascribed with the task of looking for meaning.  

Meaning of words and their relation to one another. Semantics has to explain why we choose certain words and phrases to describe things.

What defines semantics as an essential part of semantic search is the yearning we have to seek and create connections.

Imagine looking for a needle in a haystack – an undeniably nerve-wracking experience. 

You’d be looking for things on the internet with about the same level of success without the tools to enable fast and intuitive results.

Fortunately, the drive to make our lives structured and connected translates even in how we search for things on the web. This is how semantic search came to be.

We get a more detailed explanation of what semantic search is from a publication by Hannah Bast and co-authors.

As described by them, semantic search is “search with meaning”. And we can find meaning in at least a couple of parts of the search process.

First, in the query itself. Here, we need to figure out the true intent behind the request.

Then, we have to consider the data we have to retrieve, and if it truly fits what we’re looking for. 

Or, if we properly present the information so it has meaning for the search.

Breaking Down the Meaning of Semantic Search

To put it in Layman’s terms semantic search seeks to understand natural language the way a human would and give appropriate semantic web search results.

What does that mean?

Well, let’s say I type in Google’s search field “which is the smallest mammal.” 

The search engine will, understandably, answer my question based on the assumption I want to find out which the smallest mammal is – rather than look for exact matches of the phrase I’ve typed.

This is how I get as a first result an article named “World’s 6 Smallest Mammals” followed by photos of the Etruscan shrew – which, by the way, is the smallest known mammal on the planet.

Etruscan shrew

(By Trebol-a – Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=4607913)

Looking to understand the meaning of my query helps semantic search engines suggest corrections for misspelled words as well.

So, if I happen to misspell the word mammal, Google will suggest that instead of “mamal” I’m probably looking for “mammal.”

How did semantic search come to be?

Our species is drawn to look for order – and if such is lacking, we can’t help but try to create it. 

So it’s understandable that we’re building a virtual world that answers our need for order and optimized time.

Besides providing the proper answers, search engines also add sense to them with the help of artificial intelligence. 

They use semantic search machine learning to help process and rank information, and can also understand natural human speech. 

All this, in the end, provides adequate results to our queries.

But how exactly are they able to answer questions such as “World’s Biggest Doughnut?”

Semantic search has emerged from semantic web so to be true to my own, order-seeking nature, let’s look at what semantic web is first.

(Giphy)

Semantic Web Origin

Semantic Web is an extension of the World Wide Web. 

And according to the World Wide Web Consortium (W3C), it provides a common framework for data to be shared and reused. 

This is valid across applications, enterprises, and communities.

The framework, or “ontology”, as it is known in the field of information science, gathers facts and information that eventually become a system of knowledge.

To put it simply, semantic web structures and tags data in a way computers can read.

Semantic web allows analysis of specific inputs based on network or related factors. It uses sets, properties, and relations to make sense of the vast amount of data that comprises the Web.

I would compare it to me trying to build my family tree. 

I’ll definitely fail to figure out who the people my grandma claims are my distant cousins on my mother’s side are. I lack context, as I don’t know them. 

Semantic web, however, does a better job of sorting things out.

The Vision for Semantic Web

The ultimate ambition of the Semantic Web, as seen by its founder Tim Berners-Lee, is to enable computers to better manipulate information on our behalf.

The concept of what semantic web is has evolved into the two important types of data that form it today – Linked Open Data and Semantic Metadata.

Order in the Chaos- Tidying up With Semantic Search Tools

Linked Open Data (LOD) is modeled as a graph and published in a way that allows interlinking across servers. 

It essentially represents structured data. In 2006 Tim Berners-Lee formalized the Four rules of linked data as:

  1.       Use universal resource identifiers (URIs) as names for things.
  2.       Use HTTP URIs so people can look up those names.
  3.       When someone looks up a URI, provide useful information, using standard formatting (RDF, SPARQL).
  4.       Include links to other URIs. so they can discover more things.

LOD enables both people and machines to access data across different servers and interpret its semantics more easily. 

As a result, the Semantic Web shifts from a space comprising of linked documents to a space comprising of linked information. 

That, in turn, allows for an interconnected network of meaning, process-able by a machine.

There are thousands of datasets, published as LOD across different sectors. 

Some examples are encyclopedias, geographic data, government data, scientific databases and articles, entertainment, traveling, etc.

 

(By Andrejs Abele, John P. McCrae, Paul Buitelaar, Anja Jentzsch and Richard Cyganiak – http://lod-cloud.net/, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=59839961)

Since they’re interlinked, these datasets form a giant web of data or a Knowledge Graph. 

The Graph connects a vast amount of descriptions of entities and concepts of general importance.

A Game of Tag – Semantic Search Tools Vol. 2

The second important tool semantic web counts on is Semantic Metadata.

This is basically semantic tags, added to regular Web pages in order to better describe their meaning.

For instance, the home page of the Nobel Prize can be semantically annotated with references to several relevant concepts and entities – Sweden, academic advances, culture, and award, among others.

These well-determined relationships between subjects and corresponding results are best represented through structured metadata schemes, such as Schema.org

Metadata makes it much easier to find Web pages based on semantic criteria.

By learning from past results and creating links between entities, a search engine might then be able to deduce the answer to a searcher’s query, rather than provide several links that may or may not contain the correct answer.

Metadata resolves any potential ambiguity and ensures that when we search for Prince (the musician), we will not get pages about any of the many princes who are royalty, for example

You can thank the semantic web for that.

Now.

The structure of the semantic web gives us the idea of what semantic search is. It even tells us how a search engine determines which the world’s biggest doughnut is.

(Giphy)

But

Let’s take a look at its history.

Visionaries in the Field of Semantic Search

As with any large-scale movement, there’s a leader behind the change. We’ve already mentioned the name of Tim Berners-Lee, who many argue is the man behind the semantic search.

In 1998, during the infancy of the modern web, Berners-Lee was already talking about the idea in a report he published, titled Semantic Web Road Map. 

21 years later, his ideas have been adopted and semantic search is a reality.

Google was the company that brought about the change and gave way to the rise of semantic search.  

“Machines should be able to communicate with each other just like humans can,” Berners-Lee stated. 

Google is now working towards fulfilling his vision.

How?

The Turning Point for Semantic Search

While a lot has happened since 1998, 2012 was the turning point for semantic search. 

It was during this year that 20% of all Google searches were new. Not only that, but long-tail keywords made up around 70% of all searches.

This told Google that users were becoming interested in using their search engine as a tool for answering questions and solving problems. 

It wasn’t just looking up facts and finding individual websites anymore.

And thus the first step toward a semantic update was made.

The Knowledge Graph

Introduced in 2012, the Knowledge Graph marked Google’s shift to understanding entities and context, instead of mindlessly comparing strings of keywords. 

Or as Google phrased it, “things, not strings.”

What is the Knowledge Graph?

Wikipedia states Google and its services use the Knowledge Graph to enhance its search engine’s results with information from a variety of sources.

In other words, a knowledge graph is a programmatic way to model a knowledge domain – with the help of experts in the subject, data interlinking, and machine learning algorithms.

What made this particular graph a semantic search tool was the way it collected information.

It gathered data, which was considered public domain (e.g., from the size of the Earth to the names of a band’s members), together with the properties of each entity (birthdays, siblings, parents, occupations – everything that can be linked to that entity.)

(Pngkey)

Or

We can say it built on top of the existing databases to link vast amounts of data together – combining both structured information (lists) and unstructured one.

The knowledge graph gathers information the search engines require to give sensible answers.

Google’s graph sets the stage for the large-scale algorithmic changes to come. And soon it was followed by Hummingbird.

Accelerating Towards Success With Hummingbird 

Hummingbird was a turning point. The algorithm impacted around 90% of searches worldwide. 

It was designed to be precise and fast and many refer to it as the tool that introduced “conversational search” into the search activity. 

It was semantic search technology’s star.

However, Hummingbird does more than just offer answers to conversational queries.

The algorithm pays attention to each word in a query. 

Then it makes sure the whole query, the whole sentence or meaning is taken into account, rather than particular words.

The intention is to get pages matching the deeper meaning, rather than just the actual words.

There’s more.

In addition to the enhancements in speed and accuracy of the Hummingbird update, Google made sure it integrated semantic search.

(Giphy)

They significantly improved their understanding of search queries – even long-tail search – and thus user intention.

As a result

Entire queries and the relations of word groups within search queries were identified, targeted, and interpreted.

The Effects of the Hummingbird Algorithm

The Hummingbird improvements were particularly focused on contextual and conversational search.  

Both areas are strongly linked to fundamental semantics and the relation between words.

Now.

The algorithm processes natural language in order to retrieve niche results for queries both at the head and long-tail level.

In other words, it uses contextual search where Google increasingly returns results that match the intention behind the query. 

Results are no longer limited to the words themselves but include an interpretation of intent for the search terms.

How exactly?

What the tool does is check for relations that have not been explicitly modeled. 

The process combines grammar, statistics, and dictionaries to achieve relational tagging.

By appraising the intent in a semantic manner and focusing on synonyms and theme-related topics Hummingbird allows its users to confidently search for topics and sub-topics instead of trying to “abracadabra“ their way through the search. 

The algorithm is in many ways a definition of semantic search.

An example that illustrates how Hummingbird actually works can be a search, such as “President of England”.

Now.

England doesn’t have a President, but a Prime Minister, who’s the head of the government. England also has a Head of State, who’s the Queen.

And Google knows that, so it will display results related to the Prime Minister or the Queen.

In a way, Hummingbird allows people to get an answer to a question they don’t know how to ask – and curate results that help users find what they’re looking for.

Location-orientated

(Giphy)

Another improvement Hummingbird brings is local-orientated results. 

Thanks to the use of context, local results become more precise.

So when you’re looking for good Italian restaurants, Google will assume you want to have dinner in your city. 

That’s why it will use your location data to recommend good pizza in your area, instead of listing restaurants in Italy.

We often take for granted the precision, with which we get the right results. 

It is the fruitful harvest of years of research and development behind the scenes. 

The dream of semantic search took shape through a combination of conversational language processing and understanding human intent based on location data.

Hummingbird was an important breakthrough for semantic search, but Google didn’t stop there. 

Another quite important improvement they introduced, later on, was RankBrain.

Artificial Intelligence in the Semantic Web World

RankBrain is the semantic search machine learning tool that came as an answer to a problem Google stumbled upon while answering keywords queries. 

A few years ago, around 15% of the searches Google got consisted of words it had never seen before.

It had no way of knowing exactly what the user was looking for.

At first read, 15% might not seem like a big deal. 

Still, Google processes billions of requests every day, so the percentage was a pretty significant number in absolute terms.

Some 450 million searches had keywords that were never processed before.

So what do you do when you don’t know how to answer a question?

Guess?

That’s what Google used to do when it received requests for any of those unknown keywords.

Unfortunately, that didn’t lead to accurate results. The search engine just looked for pages that contained all keywords the user had entered, without understanding the intent behind them. 

It didn’t know how to implement and produce semantic search for requests the search engine had never received before.

That pushed Google to find a solution and introduce a tool that could learn on the go.

Enter RankBrain

The machine learning-based (AI) search engine algorithm that helps Google process search results and provide more relevant search results for users.

Google uses the AI algorithm not only to resolve those search queries but also to process and understand them.

What changed with RankBrain?

Before RankBrain, 100% of Google’s algorithm was hand-coded. 

So, the process relied a lot on human engineers who tried to guess what would improve search results.

Today human engineers still work on the algorithm, but RankBrain also does its thing in the background.

The Process

In short, RankBrain can tweak its own algorithm to produce a better response.

Depending on the keyword, RankBrain increases or decreases the importance of backlinks, content freshness, content length, domain authority, and other ranking variables.

Then it observes how users interact with the new search results. If they like the new algorithm better, it stays. 

If not, RankBrain rolls back the old algorithm.

With the help of its smart semantic update, Google’s able to figure out what you mean, even if it hasn’t previously interlinked your query.

How?

By matching your never-before-seen keywords to keywords that it has seen before.

For an example of how semantic web works, Google RankBrain may have noticed people search for “World’s biggest doughnut”.

And it had learned that people who search for that are pretty much looking to find the biggest doughnut ever made.

So when someone searches for “largest doughnut in the world”, RankBrain brings up similar results. 

And in the doughnut’s case, the first three webpages you get for both searches are the same.

The Method of RankBrain

Google has commented on how they’re using machine learning to better understand searcher intent through a technology called “Word2vec” that turns keywords into concepts.

For example, they say that this semantic web technology “understands that Paris and France are related the same way Berlin and Germany are (capital and country), and differently than Madrid and Italy.”

(Giphy)

And even if they haven’t specifically mentioned that this is the way RankBrain works as well, we can pretty much guess it uses similar technology.

Going back to the idea of concepts over keyword-matching – RankBrain tries to give results based on the intention of your search.

User Satisfaction vs RankBrain

Sure, RankBrain can take a gamble at understanding new keywords. And it can even adjust the algorithm on its own.

Number one question then is:

Once RankBrain shows a set of results, how does it know if they’re good?

Well – it observes.

RankBrain uses UX signals – at least that’s the technical term. 

In simpler words, this means RankBrain shows you a set of search results it thinks you’ll like. 

If lots of people like one particular entry, they’ll give that page a ranking boost.

What if they don’t?

Then the algorithm drops that page and replaces it with a different one.

What does RankBrain observe exactly?

It pays close attention to how we interact with the search results.

There are several signals it’s monitoring:

  1. Organic Click-Through-Rate (CTR)
  2. Dwell Time
  3. Bounce Rate
  4. Pogo-sticking

These are known as user experience signals (UX signals).

Let’s look at an example and see how Google’s semantic web would interpret my search.

If I search for “phone with the best camera”, the first result I get is an article published mid-June. 

This calls back to the freshness of content RankBrain assesses when suggesting answers to queries. 

But let’s leave that one for the moment.

The algorithm will pay attention to the website I open. It will compare how many times it has been opened before for similar results- thus giving the CTR.

Once I’ve opened the page, RankBrain will observe my dwell time. This is the time I spend on the website. That way, the algorithm will estimate if I found the information useful. 

If I open to see content that has nothing to do with my query or it’s presented poorly, I’ll quickly go back to the results page. 

(Giphy)

If enough people do that, the website’s rankings will fall.

And if the page doesn’t load on time, the chance of bounce increases and with it the ranking of the page plummets.

Now, let’s say I’m not able to find what I am looking for with my first click on a page. I will probably continue probing the results I get until I find it. 

And that is another factor that RankBrain uses to analyze the success of its work – pogo-sticking.

The more I go back and forth, the less likely it is that RankBrain will suggest those unfortunate pages to the next user with similar searches.

Now.

We’ve covered the basic semantic tools search engines like Google use to understand and suggest adequate answers to their users’ requests. 

So, we can take a look at how we can use those to our advantage.

How to Optimize Content for Semantic SEO

For SEOs, understanding semantic search has major benefits. A large part is the ability to remain ahead in the race. 

There are several steps to a good semantic SEO strategy suggested by experts all around. 

And as semantic search gets more influential as time passes, those steps are good advice to help anyone optimize their content and rank their website better.

  1.       Consider topics, instead of just keywords
  2.       Match content to search intent
  3.       Include related keywords in your content
  4.       Optimize your content for featured snippets
  5.       Include structured data in the content
  6.       Consider topics instead of just keywords

As we’ve seen earlier in the article, it’s all about the topics – the context of one’s search. And Google and other search engines are looking to provide us with the most relevant results.

So content should be more comprehensive and informative than ever before.

If you’re thinking of creating short and flat pages of content for every variation of a broad search query – don’t bother. You should instead create a comprehensive and lasting guide that covers the entire topic.

You should then use keyword optimization best practices to ensure content is fully optimized for both search engines and readers.

Match Content to Search Intent

Before creating content for the SEO keywords you want to target, you should ask why the user would search for that phrase. Establish what intent the keyword represents and you’ll also have a much easier time engaging your audience.

The intent of the keyword can be:

  1. Informational – the user is trying to learn something, so they use “know” keywords to look for information and get answers;
  2. Navigational – the user is trying to navigate to a specific site or find a specific item, so they use “go” keywords to find the website for a familiar brand;
  3. Transactional –  the user is trying to make a purchase, so they use “do” keywords to find a product to purchase or a page to make a transaction.

Include Related Keywords In the Content

To check the semantics bar of semantic search you should add related or Latent Semantic Indexing keywords (LSI) to the content.

LSI keywords are phrases that are closely related to a target keyword. They give context to the content and help search engines better understand what the content means and how it serves audiences.

So when you talk about chocolate, you should at least relate it to cocoa.

Optimize Content for Featured Snippets

Search engines like to display rich results that give users the information they want – directly on their result page.

To increase search visibility, you might want to:

  1. Optimize content for answer boxes and paragraph, list, and table featured snippets
  2. Clearly answer questions in the content focusing on long-tail keywords
  3. Use formatting to make the information an attractive option for featured snippets

Finally, Include Structured Data in the Content

Another way to help search engines understand the meaning and relevance of your content is through structured data.

Structured data, or schema markup, is a form of microdata that adds additional context to copy on a webpage.

It uses a set of standard data structures that categorize content for search engines. 

This extra information helps search engines rank content and identify information that can be displayed in rich search results.

In practical terms, all we’ve said so far boils down to one thing. 

To make the most of our online presence, the information we publish should be semantically organized.  

Context is the future of semantic web search. While there are still pieces of the puzzle to collect, the semantic web is already alive.

Perhaps it’s not far the day when a next-generation intelligent network will assist us by scheduling our appointments, doing our shopping, finding the information we need, and connecting us with like-minded people. 

On top, doing it autonomously.

We won’t have to ask what semantic search is then, for sure. It will have become an inextricable part of our everyday life. 

(Giphy)

 

 

Related Posts

Leave a Comment