Author: Michel Fortin
“While we frequently need just the facts, more often what we are really seeking is understanding.” — Lou Frenzel
I’ve been using email since the late 1980s. Before the days of online email, I used to use software called Eudora to download them into desktop folders. At one point, I had thousands of emails stored in countless folders and subfolders. The challenge, which grew along with my folder count, was deciding which single folder to file an email into. Another challenge was searching for it later on, particularly when I forgot where I filed it.
Back then, search functionality was quite limited and only looked for exact matches. But, trying to guess the words that an email might contain was excruciating, particularly if I needed to find an email several months or years later. The task became even more daunting if I used the wrong words, sequence of words, or spelling. So, I would often give up in frustration.
Then in 2004, Gmail came along with a radically different way of managing email. I could label my emails rather than filing them into folders where they would otherwise end up being forgotten. Plus, emails didn’t need to follow a specific hierarchy. I could use as many labels as I wanted, and they could mean different things depending on what I might eventually need the email for.
Whether it was two or ten labels, I simply picked whatever topics made sense to me. I no longer racked my brain over which folder an email should go into, and it saved me a considerable amount of time, effort, guesswork, and frustration when I needed to search for them. I could file and find emails quickly based on why I needed them, and not just on where I might have filed them or on what words they might contain.
Now, what does my email have to do with SEO?
Just as Gmail changed how I categorize my emails, several years ago Google Search changed how it categorized documents in its database. Called “semantic search,” this new process would label topics in much the same way Gmail did, and instead of looking for exact matches, it allowed Google to consider context and meaning over individual keywords. This vastly improved its ability to understand and satisfy search queries, and it was a defining moment that profoundly changed the world of search.
This article will show you what it is, what it means, and how to harness it to improve your visibility.
Table of contents:
What is semantic SEO and why does it matter?
Keywords are a fundamental part of the search process. But since the introduction of semantic search, individual keywords have become less important. It’s not that Google has stopped using them or that we should, for that matter. It’s that Google is now able to understand and match queries beyond keywords. Rather than looking for keywords, Google now looks for concepts. What’s the difference?
Lexical search (or search “with words”) looks for literal matches to queries, which might include alternatives and word variations, such as different spellings, synonyms, and word structure.
Semantic search (or search “with meaning”), on the other hand, looks for conceptual matches (i.e., what the queries are about and not just what they say), such as people, places, things, events, organizations, and so on, which may or may not include exact keywords.
In essence, concepts are keywords, but not all keywords are concepts. Keywords are just a bunch of words strung together, which can mean anything or nothing at all. Concepts, on the other hand, are topics, and the topical relevance of a match (search result) to a query is based on the query’s intent, not its wording.
During the information retrieval process, search engines can only find and match keywords, and they rely on those matches to determine relevance. But individual keywords are often poor indicators of relevance. For this reason, they analyze keywords based on their appearance, the frequency of their appearance, and their prominence on a page and across other pages.
With a statistical formula called “term frequency (multiplied by) inverse document frequency” or TF-IDF, search engines measure relevance by comparing a keyword’s frequency to others across a set of documents. TF-IDF does a good job of measuring the relevance of a keyword. But, it’s often wrong or falls short.
For example, the word “cats” appears on a page quite often and more than any other. Surely it must be about cats, right? Whether it’s the feline variety or the theatrical one, that’s a different issue. It’s too broad and ambiguous. But even if a query is more specific, like “hypoallergenic cats” or “Cats the musical,” the page fails to take intent into account. Is it to learn about breeds? Or, is it to buy tickets?
Semantic search, on the other hand, can help uncover the purpose of a search and offer results that are more meaningful. Whilst lexical search only looks at keywords in isolation (i.e., what they say), semantic search looks at keywords in context (i.e., what they mean). It’s through this meaningfulness that search engines can better gauge the relevance of a match to the query’s intent.
How Google evolved from magician to mentalist
TF-IDF is still in use today and an important part of the information retrieval process. But, its reliance on keywords fueled SEO tactics and tools around “keyword density.” The objective was to force keywords into content, repeat them often, and incorporate them in as many different locations as possible. If the keyword was misspelled or if it was just a string of unrelated words that made no sense, such as “best catnip Toronto cheap,” it would make the content appear equally nonsensical.
Luckily, Google introduced updates to its algorithm (such as Panda) to clean up poorly written content from its search results. But lexical search, on its own, still has many drawbacks. It often yielded search results that were irrelevant or disparate due to the lack of intent, and it often failed to match what the user had in mind. This created three common yet significant challenges:
01. For those conducting a search, the tendency would be to repeat the process. They would try to guess better keywords to search for, or they would reword or refine their query with the hope of finding better matches. But, each repeated attempt would simply add to their frustration.
02. Many would click a search result that appears to be a fit, realize that it’s not, return to the search engine, and move on to the next. They would bounce from result to result (called “pogosticking”) with the hope of finding the one that more closely met their needs.
03. On the flip side, web developers and SEO professionals would agonize over which keywords to cram into their content and how often to include them, without making the content look so robotic and contrived that it would drive both Google and users away.
Ultimately, lexical search, while effective at finding matches, can only guess what a user is searching for, but is limited by the keywords and documents in its database. It’s like forcing an increasingly frustrated user to play along with a poor magician who, working strictly from a deck of cards, keeps failing to guess what the user has in mind: “Is this the one? No? How about this one? Still no? How about this one, then?”
With semantic search, however, Google went from being a failed card-guessing magician to being a skilled mentalist who can more closely match what the user had in mind. Going beyond a simple deck of cards, it looks for external cues and clues, and it makes connections between them. It gathers data and learns from it in order to make more accurate guesses. In other words, it’s making educated guesses.
In the same way, semantic search is Google’s way of going beyond keywords, capturing additional and external data, and then making connections between them. Educated guesses come from learning, but in Google’s case, it’s machine learning—or what people often refer to as artificial intelligence or AI.
The key components of semantic search
Google gathers information, adds it to a repository, and groups related topics together. It then makes connections based on how various pieces of information relate to one another. Through these interconnections, Google can understand what the concepts mean to each other and how they relate to the user’s query, which in turn helps it better match results to the user’s intended goal.
While the late Bill Slawski uncovered a related patent filed in 1999 by Google co-founder Sergey Brin, it wasn’t until 2013 that Google officially launched semantic search. After rolling out a major algorithm update dubbed “Hummingbird,” Google’s algorithm was able to better recognize concepts, extract them, and learn from them. As a result, its search results became considerably more efficient and accurate.
Hummingbird works with a model first introduced the year prior, called the knowledge graph. It’s simply a graphical representation of the semantic relationships between concepts. What it is and how it works can be a little technical to explain, and I’m no engineer by any stretch. But, in plain English, it’s simply an alternative classification system to label and structure content through entities, their properties, and the way they relate to one another. Here’s a look at what these are:
Entities are the concepts behind keywords. But unlike keywords, entities are not necessarily words or strings of words. They’re specific concepts in the real world that can represent people, places, events, ideas, organizations, and so on—or, as Google calls them, “things, not strings.”
As an example, I took the first paragraph of my online bio and ran it through Google’s Natural Language Processing API demo. It found the following entities, which are the colored words below:
Attributes define the categories to which the entities belong as well as their properties and characteristics. An entity, by itself, doesn’t mean anything. But in context, it means something. For example, the entity “Michel Fortin”' is the name of a person, where “person” is a category and “name” is a characteristic. “Marketing advisor” is that person’s profession and “SEO” is an industry he works in.
Keep in mind that these attributes are my own and may not reflect Google’s. Attributes depend on their chosen ontology (i.e., the method of formally classifying information). For example, how I choose to label my Gmail emails may not mean much to someone else who may have chosen a different set of labels. My labeling system is my own. Similarly, every domain has its own distinct knowledge graph and ontology, such as Wikipedia, Bing, and of course, Google Search.
Attributes may vary depending on the context, such as “cats” being either the mammal or the musical. It’s through their connections with others that Google can identify the proper one.
Like keywords, entities are meaningless by themselves. But through their connections with others, Google can understand their context. It’s the relationships between entities that give them meaning. As French composer Claude Debussy once remarked, “Music is the space between the notes,” not the notes themselves.
The knowledge graph and Google’s knowledge panels
A knowledge graph is simply a graphical representation of the collection of interlinked entities and their attributes. As opposed to a hierarchically linear system of files and folders, the knowledge graph looks like expanding clusters with connections in all directions.
They’re made up of nodes (i.e., entities) and edges (i.e., the connections between them). Labels help identify attributes and relationships. Here’s an example of a knowledge graph using a few more paragraphs from my bio:
For example, “Michel Fortin'' is a person, and “marketing advisor” is a title. Both are nodes in the graph. Within this context, one can infer that “is a” means that “marketing” is a field of work, so the edge between them is therefore labeled “position held.” This tool calls these “facts,” and collects facts and adds them to the knowledge graph. Here’s a look a few of them that the above graph software found:
While the demo above used a select portion of my bio, Google continually collects and connects entities from various sources. It makes inferences in many ways, such as by analyzing other entities in proximity. For example, if a page with the word “cats” also mentions “Andrew Lloyd Webber” or the song “Memory,” then it’s likely not relevant to your search about Whiskers’ latest hairball incident.
The knowledge graph is also useful to help users quickly find information about an entity (as well as any other related information about it). Depending on certain factors, such as Google’s confidence in the facts it gathered, it will display content from its knowledge graph in its search results, usually on the side (if you’re on mobile, this appears in the main results column), called “knowledge panels.” Here’s an example of a knowledge panel:
How to get your own knowledge panel
Getting a knowledge panel (such as one about you, your brand, or your business) can significantly enhance your visibility in search results. To get one, you must first get into the knowledge graph.
Keep in mind that knowledge panels are never guaranteed. But getting into the knowledge graph, if anything, can help Google make better connections, which in turn can help improve the quality of your traffic.
First, search for the entity (e.g., yourself, your brand, etc.) to see if it’s already in the knowledge graph. If it is, you may be able to claim it and help Google improve it (via edits that can be suggested after you claim it). If it isn’t, Google must first recognize you in order to build an entity and be confident in the data it collects. This can take time, but there are a few shortcuts that can help expedite things.
Google recognizes peer-reviewed sites, journals, and publications as authoritative sources of information. Some of these include Wikipedia, Wikidata, Google Scholar, industry journals, accreditation associations, academic institutions, national news publications, and several others.
Become a contributor and create a page on these platforms. The approval process may take time, so be patient.
Some authoritative resources can be tough to get into and take time. (Examples include getting your own page on Wikipedia, getting academic citations on Google Scholar, having peer-reviewed articles published in industry journals, getting published in Harvard Business Review, being featured in The New York Times, etc.) But one simple way to get some basic traction is through social media.
Create profiles about the entity you want, whether it’s you, your business, or your organization. Focus on the top social networks like Twitter, LinkedIn, Facebook, Reddit, Quora, and so on. Make sure to include links to those pages and profiles on your website, too.
Along with creating social media profiles, claim listings in industry directories, licensing bodies, alumni associations, peer review sites, trade journals, and certification organizations, such as Yelp, Trip Advisor, BBB, RateMD, FindLaw, Chamber of Commerce, etc.
Like Google Business Profile (GBP), these listings may help your local SEO efforts. But, any mention of you or your business in such directories (such as in user-generated reviews and ratings), whilst they may be less authoritative, can still help authenticate and credentialize your brand or business.
With all the above, the next step is to ensure Google can find the information and make the connections (e.g., establish that a profile on an external site is indeed you). Sometimes, this means adding links to your profiles on a distinct page of your site, which may be an about page, an online bio, the company’s history page, and so on. Other times, it means adding guiding information on your site using snippets of data called “schema.”
Schema markup is structured data you can add to your website’s HTML code that only search engines can read. SEOs and site owners implement schema to provide search engines with additional context about their content (such as identifying an author or organization, for example). While it can improve your chances to appear in search results, it can also help search engines establish important connections, which increase your chances of getting into the knowledge graph. It can even increase your chances of getting your own knowledge panel and improving it.
To some degree, this may seem like you’re hacking the search engine or spoon-feeding it with information. But, remember that the above information is already available on the web (or should be). Schema is simply a way to point Google in the right direction and help it make those connections. After all, Google is only a robot, and humans can often be subjective and fuzzy—which, in fact, leads me to my next point.
The role of machine learning in semantic search
One of Google’s challenges is trying to guess for which body of knowledge a search is intended. Humans are a fickle lot, and things can change depending on the situation. What if a user’s needs change? What if a concept changes and no longer meets the user’s needs? What if the interpretation of a certain concept varies from person to person, time to time, or place to place?
Take, for instance, the word “mask.” In late 2019, search results would be about costumes and cosmetics. But just a few months later, they switched to medical masks and rules around mask use (due to the COVID-19 pandemic). Jumping from being about appearance to being about prevention in a short period of time is quite an extraordinary leap in meaning. But it’s not rare as it happens all the time, albeit in smaller and less obvious ways.
This is where machine learning comes in. It helps Google improve at recognizing entities within a document, and at identifying and inferring relationships between them. Moreover, as machine learning grows increasingly knowledgeable, Google introduces new deep-learning algorithms to improve its capacity to interpret information. Some of these include the following, for example:
RankBrain is part of the Hummingbird algorithm and learns to better interpret intent based on a number of factors, such as location, time, input, news, signals, etc. Remember the “mask” example from earlier? Rankbrain may have played a role.
BERT (or Bidirectional Encoder Representations from Transformers) learns to better interpret conversational queries (e.g., questions) by pre-training its algorithm to learn how to recognize words from both the left and right (hence, bidirectional).
SMITH (or Siamese Multi-depth Transformer-based Hierarchical encoder for long-form document matching) is similar to BERT, but rather than short texts such as conversational queries, it learns to better interpret information from larger (i.e., long-form) blocks of text.
MUM (or Multitask Unified Model), which is a more recent addition, learns to better interpret information from other sources, such as in different languages or in other formats that go beyond text-only content (e.g., images, video, maps, etc).
These are important algorithm updates that are part of semantic search to some degree. But, there are several others that are either similar or aim to improve existing algorithms. It’s unclear if Google uses all of these algorithms all of the time. They may be used only for testing or training. The bottom line is that Google keeps learning by improving how it learns and not just what it learns.
As Google said, it’s all about things, not strings
Nevertheless, when it comes to SEO, doing keyword research is helpful. However, focusing and restricting yourself to keywords may not be as helpful. If things ever change (and they always do), your rankings will drop or disappear as a result. If they do, your tendency may be to create more of the same content or to inject more of the same keywords. This approach is not only less than helpful—it can be harmful.
Google is making educated guesses with the help of machine learning, and its capabilities are changing and improving all the time—so should your SEO efforts. By adopting a user-first SEO approach, and by learning about what your users want beyond the keywords they’re using, you’re helping Google, too. As American entrepreneur Jim Rohn once said, “If someone is going down the wrong road, they don’t need motivation to speed them up; what they need is education to turn them around.”
Michel Fortin - VP of Digital Marketing at Musora Media
Michel Fortin is a marketing advisor, author, speaker, and the VP of Digital Marketing at Musora Media, the company behind Drumeo. For nearly 30 years, he has worked with clients from around the globe to improve their visibility, build their authority, and grow their business.