Intro to technical SEO: A guide to improving crawling and indexing for better rankings

Aleyda Solis
Sep 4, 2024
13 min read

Updated: Oct 17, 2024

SEO expert Aleyda Solis. The text on the reads 'intro to technical SEO'

The highest quality content on the web won’t get any search traffic if technical configurations aren’t correctly optimized for effective crawling and indexing.

On the other hand, stellar technical SEO can help guide search engines (and users) to your most important pages, enabling you to bring in more traffic and revenue.

In this article, I’ll guide you through the key concepts, configurations, and criteria necessary to fully leverage technical SEO for your website. Let’s begin.

Table of contents:

Technical SEO: What it is and why it’s important
Crawlability, indexability, and rendering: Fundamental technical SEO concepts
Technical SEO configurations to understand and optimize

Technical SEO: What it is and why it’s important

Technical SEO is the practice of optimizing your website configurations to influence its crawlability, rendering, and indexability so that search engines can effectively access and rank your content.

This is why technical SEO is considered essential and one of the main pillars of the SEO process.

The three pillars of SEO include content, backlinks, and technical SEO.

It’s referred to as ‘technical’ because it doesn’t pertain to optimizing on-page content, but rather optimizing the technical configurations (e.g., HTTP status, internal linking, meta robots tags, canonicalization, XML sitemaps) with the goal of ensuring that search engines can access your content.

It’s crucial to understand that while you don’t need to be a web developer or know how to code to handle technical SEO, you do need to grasp the basics of how websites are constructed.

This includes understanding HTML and how other web technologies, like HTTP and JavaScript, function. This knowledge helps you evaluate and confirm that your website is optimized effectively for search.

Overlooking technical SEO can lead to your pages not appearing in search results, ultimately resulting in lost opportunities for rankings, traffic, and the revenue that comes with it.

The fundamental technical SEO concepts: Crawlability, indexability, and rendering

A screenshot of the page indexing report in google search console, showing the proportion of indexed pages to non-indexed pages. There’s a chart showing the trend over time, with the number of indexed pages growing and the number of non-indexed pages shrinking. — Crawlability is the first step to getting your pages indexed by Google.

Search engines, like Google, begin the process of providing results to users by accessing website pages (whether they’re text, images, or videos)—this is known as crawling.

Once they’ve accessed and downloaded this content, they analyze it and store it in their database—this is known as indexing.

These are key phases of the search process and you can influence them through the technical setup of your website.

Let's take a closer look at each of these phases to understand how they function, and why and how you’d want to optimize them.

Crawlability: Search engines discover your website pages through a process called ‘crawling’. They use ‘crawlers’ (also known as ‘spiders’ or ‘bots’) that browse the web by following links between pages. Search engines can also find pages through other means, like XML sitemaps or direct submissions through tools like Google Search Console. Some search engines (including Microsoft Bing, Yandex, Seznam.cz, and Naver) use the IndexNow protocol (which Wix supports) to speed up discovery when you create or update content. Popular search engines have their own crawlers with specific names. For instance, Google’s crawler is called ‘Googlebot’.Websites can control which search engines access their content through a file called robots.txt, which sets rules for crawling.To ensure search engines can find and access important pages while preventing them from accessing unwanted ones, it’s crucial to optimize your technical configurations accordingly.

Indexability: After a search engine crawls a webpage, it analyzes its content to understand what it’s about. This process, known as indexing, involves evaluating the text-based content as well as any images or videos. In addition to HTML pages, search engines can often index content from text-based files, like PDFs or XMLs.However, not every crawled page will get indexed. This depends on factors like the originality and quality of the content, certain HTML configurations like meta robots and canonical annotations, and reliance on JavaScript for key design and content rendering, which can make indexing difficult.During indexing, search engines check if a page is a duplicate of others with similar content and select the most representative one (referred to as the ‘canonical page’) to display in search results. Therefore, it’s crucial that you correctly configure and optimize these different elements to ensure effective page indexing.

Rendering: If your website utilizes client-side JavaScript, search engines need to perform an additional step called ‘rendering’ to index your content.Client-side JavaScript rendering involves using JavaScript to create HTML content dynamically in the browser. Unlike server-side rendering, where HTML is generated on the server and sent to the browser, client-side rendering starts with a basic HTML file from the server and uses JavaScript to fill in the rest.Because of this, search engines have to execute the JavaScript before they can see the content. While search engines like Google and Bing can render JavaScript to index the page, it requires more resources and time, and you might encounter limitations when relying on client-side rendering on a large scale. That’s why, when using JavaScript, it’s best to opt for server-side rendering to make indexing easier.

Technical SEO configurations to understand and optimize

Now that you understand the considerations that technical SEO seeks to optimize, let’s look at the different configurations that influence your technical SEO and how to optimize them to maximize your organic search visibility.

I’ll cover:

HTTP status
URL structure
Website links
XML sitemaps
Robots.txt
Meta robots tag
Canonicalization
JavaScript usage
HTTPS usage
Mobile friendliness
Structured data
Core Web Vitals
Hreflang annotations

HTTP status

HTTP status codes are numerical responses from your web server when a browser or search engine requests a page. These codes indicate whether the request was successful or an issue occurred.

Here are key HTTP status codes and their implications for SEO:

2xx (success):
- 200 OK — Page successfully found and available for indexing assessment.

3xx (redirection):
- 301 moved permanently — This indicates a permanent move to another URL; it transfers the SEO value of the former URL to the final destination. That’s why SEOs use 301 redirects when performing a website migration, changing a URL, or when removing a page that used to attract rankings, traffic, and backlinks.
- 302 found — This indicates a temporary move and doesn’t transfer the former URL’s SEO value to the target page.

4xx (client errors):
- 404 not found — This indicates that the page was not found. A high number of 404 errors can impact your site’s crawl budget (i.e., the amount of time and resources a search engine dedicates to crawling your website).
- 410 gone — This indicates an intentional and permanent removal. This can be useful for de-indexing a page if it doesn’t have any rankings, traffic, or links.

5xx (server errors):
- 500 internal server error — This indicates the server failed to fulfill a request. This can be harmful to your SEO if not resolved.
- 503 service unavailable — This code indicates that a page is temporarily unavailable and can be used for website maintenance without impacting your SEO. You can use this status code to tell search engines to come back later.

Soft 404 errors: These occur when a page returns a 200 OK status, but lacks content or shows an error message, suggesting that it doesn’t exist anymore or providing a poor user experience. For permanent content relocation, use a 301 redirect. For removed content, redirect to the parent category if the page had value, or use a 410 status if it didn’t.

A screenshot of the bot traffic by page report in Wix, showing a table of URLs, bots, and their corresponding response status codes. — Wix site owners can review their Bot Traffic by Page report to assess HTTP status codes.

URL structure

A well-designed URL structure is important for both search engines and users to understand the content of your webpages.

Here are some widely accepted best practices for URL structure:

Keep URLs simple, short, lowercase, and descriptive, using meaningful words instead of IDs.
Use hyphens to separate words. Avoid underscores, spaces, or concatenation.
Avoid generating multiple URLs for the same content, such as through session IDs or excessive parameters.
Maintain a logical folder structure without going too deep to prevent overly long and complex URLs.
Consistently use trailing slashes or non-trailing slashes to avoid duplicate content issues, and use 301 redirects to enforce canonical URLs.

Good URL structure example	Poor URL structure example
yoursitename.com/smartphones/iphone/	yoursitename.com/id-23-p?id=2

Website links

Links are crucial for search engines to discover new pages and for users to navigate your site. To optimize your website’s links, implement the best practices below.

Include navigation links: Utilize main menus, footer links, and editorially placed links within your content to enhance crawlability and browsing experience.
Use HTML tags: Use the <a href=""> HTML tag for links to ensure crawlability and avoid JavaScript-based links.
Create descriptive anchor text: Use descriptive, relevant anchor text that accurately describes the linked page, incorporating targeted keywords when possible. Avoid generic terms like ‘click here’ or ‘read more’.
Link to canonical URLs: Directly link to canonical, indexable URLs. Avoid linking to pages that redirect or trigger errors.
Link to absolute URLs: Use full URLs instead of relative URLs to prevent issues.
Structure and prioritize your linking strategy: Follow a logical, hierarchical structure for internal linking, prioritizing high-value pages. Cross-link between similar pages to aid both users and search engines.
Avoid nofollow for internal and trusted external links: Generally, internal links should be followed by default. Reserve the rel="nofollow" attribute for when you don’t want to pass link equity.

XML sitemaps

XML sitemaps are files (in XML format) that tell search engines about the essential, indexable files of your website, such as pages, videos, or images, and their relationships. They aid search engines in efficiently crawling and indexing this content.

While not mandatory, XML sitemaps are recommended for highly dynamic or large websites with thousands of URLs (or more). They complement internal links, helping search engines discover URLs within a site.

There are various types of XML sitemaps, including general, video, image, and news sitemaps. Most web platforms automatically generate and update XML sitemaps when you add or remove new pages.

Considerations for creating XML sitemaps include:

Adhering to size limits (50MB uncompressed or 50,000 URLs)
UTF-8 encoding
Placing them at the root of the site

URLs within sitemaps should be absolute references.

Here’s an example of an XML sitemap that includes only one URL:

An example of an XML sitemap with only one URL. It reads: <?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>https://www.example.com/page.html</loc> <lastmod>2020-08-30</lastmod> </url></urlset>

Robots.txt

The robots.txt file, located at a website’s root, controls which pages search engines can access and how quickly it can crawl them.

Use it to prevent website overload, but don’t rely on it to keep pages out of Google’s index. The file must be UTF-8 encoded, respond with a 200 HTTP status code, and be named “robots.txt”.

Your robots.txt file consists of groups of rules, each starting with a user-agent directive specifying the crawler. Allowed rules include:

User-agent — Specifies which crawlers should follow your rules.
Disallow — Blocks access to a directory or page using relative routes.
Allow — Overrides a disallow rule to allow crawling of a specified directory or page.
Sitemap — Optionally, you can include the location of your XML sitemap.

Here’s a few examples of what a robots.txt file can look like:

Three examples of robots.txt usage: # Example 1: Block all crawlers User-agent: * Disallow: / # Example 2: Block access of the Googlebot to a directory User-agent: Googlebot Disallow: /subdirectory/ # Example 3: Block access of the Googlebot to a page User-agent: Googlebot Disallow: /page.html

Meta robots tags

Meta robots tags are placed in a page’s HTML head or HTTP header to provide search engines with instructions on that particular page’s indexing and link crawlability.

An example of a meta robots tag: <!DOCTYPE html> <html> <head> <meta name="robots" content="noindex"> <!-- Other head elements --> </head> <body> <!-- Body content --> </body> </html>

In the example above, the meta robots tag includes the "noindex" directive, telling search engines not to index the page. Both the name and content attributes are case-sensitive.

Allowed directives include:

"noindex" — This prevents page indexing.
"index" — This allows page indexing (it is also the default, if not otherwise specified).
"follow" — Allows search engines to follow links on the page.
"nofollow" — This prevents search engines from following links on the page.
"noimageindex" — This prevents indexing of images on the page.

You can combine these directives in a single meta tag (separated by commas) or place them in separate meta tags.

An example of a meta tag with the following directives: <meta name="robots" content="noindex, nofollow"> — You can combine directives in a single meta tag.

Canonicalization

Canonicalization refers to selecting the main version of a page when multiple versions or URLs exist, therefore preventing duplicate content issues. Duplicate content can result from URL protocol variations (HTTP and HTTPS), site functions (URLs with parameters resulting from filtering categories), and so on.

Search engines choose the canonical version based on signals like HTTPs usage, redirects, XML sitemap inclusion, and the <link rel="canonical"> annotation.

Practical methods to specify the canonical URL include:

301 redirects — You can simply direct users and crawlers to the canonical URL.
<link rel="canonical"> annotations — Specify the canonical URL within the page’s HTML <head>.
XML sitemap inclusion — This signals the preferred URL to search engines.

301 redirects are ideal when only one URL should be accessible, while <link rel="canonical"> annotations and XML sitemap inclusion are better when duplicate versions need to remain accessible.

Canonical annotations are typically placed within the HTML <head> or HTTP headers, pointing to the absolute URL of the canonical page. For example:

An example of a canonical tag in the HTML head: <html> <head> <title>Technical SEO Guide and Concepts</title> <link rel="canonical" href="https://example.com/technical-seo/" /> ... </head>

For non-HTML files like PDFs, you can implement canonical tags through the HTTP header.

JavaScript usage

JavaScript can enhance website interactivity, but some sites also use it for client-side rendering (where the browser executes JavaScript to dynamically generate page HTML).

This adds an extra step for search engines to index content, requiring more time and resources, which can result in limitations at scale. That’s why server-side rendering is recommended instead.

Some web platforms, like Wix, use server-side rendering to deliver both JavaScript and SEO tags in the most efficient way possible.

If you can’t avoid client-side rendering, follow these best practices:

Ensure links are crawlable using the <a> HTML element with an href attribute.
Each page should have its own URL, avoiding fragments to load different pages.
Make the resources needed for rendering crawlable.
Maintain consistency between raw HTML and rendered JS configurations, like meta robots or canonical tags.
Avoid lazy loading above-the-fold content for faster rendering.
Use search engine tools, like Google’s URL Inspection tool, to verify how pages are rendered.

HTTPS usage

HTTPS (Hypertext Transfer Protocol Secure) is crucial for sites handling sensitive information as it encrypts data exchanged between users and your website.

Search engines, like Google, use HTTPS as a ranking signal, prioritizing secure connections in search results for better user experience. To ensure security, all pages and resources (images, CSS, JS) should be served via HTTPS.

Migrating to HTTPS involves:

SSL/TLS certificate — Purchase and install this on your web server.
Server configuration — Configure the server to use the certificate.
Redirects — 301 redirect all HTTP URLs to their HTTPS equivalents.

For a smooth transition:

301 redirect — Ensure all URLs permanently redirect to HTTPS.
Update internal links — Update internal links to HTTPS.
External resources — Check external resources (e.g., CDNs) for HTTPS support.
Mixed-content warnings — Resolve any mixed-content (i.e., when secure HTTPS pages load resources over an insecure HTTP protocol), ensuring all content is loaded via HTTPS to avoid browser warnings.

Mobile friendliness

Search engines, like Google, prioritize mobile-friendly websites, using mobile crawlers to primarily index mobile content for ranking (as opposed to desktop content).

To provide a positive mobile experience, ensure that your site has a well-configured mobile version that fits mobile devices of various screen sizes correctly.

These are the three main configurations for mobile-friendly sites:

Mobile configuration	Description
Responsive design	The same HTML code on the same URL, displaying content differently based on screen size via CSS. This is the method that Google recommends because it’s the easiest to implement and maintain.
Dynamic serving	The same URL but serves different HTML based on user-agent.
Separate URLs	Different HTML for each device on separate URLs.

Regardless of the configuration, ensure mobile and desktop versions have equivalent crawlability, indexability, and content configurations (titles, meta descriptions, meta robots tags, main content, internal links, structured data, etc).

Allow search engines to crawl resources used in both versions (images, CSS, JavaScript). Avoid lazy-loading for primary content and ensure that all content visible in the viewport is automatically loaded.

Optimizing these elements will help search engines effectively access and index the mobile version of your site, improving its visibility and ranking.

Structured data

Structured data helps search engines understand and classify a page’s content, leading to enhanced search listings known as ‘rich results’.

Popular structured data types for generating rich results include: Breadcrumb, logo, event, FAQ, how-To, image metadata, product, Q&A, recipe, reviews, software, and video.

You can implement structured data in three main formats:

JSON-LD — Recommended for ease of implementation and maintenance at scale, JSON-LD uses JavaScript notation embedded in HTML.
Microdata — This format uses HTML tag attributes to nest structured data within HTML content.
RDFa — This format is an HTML5 extension supporting linked data using HTML tag attributes.

Google’s Rich Results Test tool validates structured data and provides previews in Google Search.

Here is an example of JSON-LD structured data for a recipe page:

An example of recipe structured data: <html> <head> <title>Spanish Paella</title> <script type="application/ld+json"> { "@context": "https://schema.org/", "@type": "Recipe", "name": "Paella", "author": { "@type": "Person", "name": "Maria Perez" }, "datePublished": "2020-10-10", "description": "A delicious Spanish paella with the original Valencian ingredients.", "prepTime": "PT60M" } </script> </head>

Core Web Vitals

Core Web Vitals (CWV) measure user experience for loading, interactivity, and the visual stability of a page. Google considers them in its ranking systems.

The three main CWV metrics are:

Core Web Vital metric	Description
Largest Contentful Paint (LCP)	This measures loading performance by considering the render time of the largest visible image or text block.
Interaction to Next Paint (INP)	This metric observes the latency of all click, tap, and keyboard interactions that occur throughout the lifespan of a user’s visit to a page.
Cumulative Layout Shift (CLS)	This measures visual stability by assessing unexpected layout shifts during a page’s lifespan.

Google Search Console provides insights into Core Web Vitals performance, which is crucial for site audits. You can improve Core Web Vitals by:

Removing unused JavaScript — Avoid loading unnecessary internal or external JavaScript.
Using next-gen image formats — Optimize images using lightweight formats like WebP for smaller file sizes without quality loss.
Storing cache static assets — Store assets like images, CSS, and JavaScript in the browser cache to reduce loading time.
Eliminating render-blocking resources — Asynchronously load external JavaScript to allow the browser to continue parsing HTML.
Sizing images appropriately — Specify image dimensions to allocate space on the screen, reducing layout shifts.

Hreflang annotations

Hreflang annotations are useful for indicating the language and regional targeting of a page and its alternate versions to search engines like Google.

There are three main methods for implementing hreflang:

HTML — Add hreflang tags to the page’s HTML <head> section using <link> elements.

An example of hreflang tags in the html head section: <link rel="alternate" hreflang="en-gb" href="https://example.com/en-gb/" /> <link rel="alternate" hreflang="es-mx" href="https://example.com/es-mx/" /> <link rel="alternate" hreflang="es-es" href="https://example.com/es-es/" /> <link rel="alternate" hreflang="de-de" href="https://example.com/de-de/" />

HTTP header — Implement hreflang via the HTTP header for non-HTML files, like PDFs.

An example of hreflang tags in the http header section: Link: <https://example.com/file-en.pdf>; rel="alternate"; hreflang="en", <https://es.example.com/file-es.pdf>; rel="alternate"; hreflang="es", <https://fr.example.com/file-fr.pdf>; rel="alternate"; hreflang="fr"

XML sitemap — Include hreflang annotations in an XML sitemap.

An example of hreflang tags in an XML sitemap: <url> <loc>https://www.example.com/en/page.html</loc> <xhtml:link rel="alternate" hreflang="fr" href="https://www.example.de/fr/page.html"/> <xhtml:link rel="alternate" hreflang="fr" href="https://www.example.de/fr/page.html"/> <xhtml:link rel="alternate" hreflang="en" href="https://www.example.com/en/page.html"/> </url>

Below are some best practices for implementing hreflang annotations:

Use them only for indexable pages with multiple language or country versions.
Tag only the canonical versions of URLs meant to be ranked.
Always self-refer and specify the language (and optionally the country) of the current page, along with its alternates.
You can specify only the language, but you can’t only specify the country. When you specify a country, you need to specify the language as well.
If you specify both the language and country, the language value should always be specified first, separated by a dash (-), followed by the country.

Note that Google does not rely solely on hreflang annotations to identify page targets; it also considers other signals, like ccTLDs, local language, links from local sites, and local currency.

Technical SEO is a team effort

Building your website on a foundation of technical SEO best practices helps you get the most traffic from the content you’re creating anyway. Oftentimes, however, you’re not the one responsible for actually implementing technical SEO recommendations, which could mean that those suggestions don’t get implemented in a timely manner, hampering your search visibility as well as your career growth.

To get your recommendations across the finish line, you need to:

Set the foundations for partnership with devs and product stakeholders
Strengthen communication for better implementation and outcomes
Prioritize your recommendations
Validate technical SEO execution

To learn more about how to do just that, read my other article on how to get technical SEO recommendations implemented.

Aleyda Solis - SEO Consultant and Founder at Orainti

Aleyda Solis is an SEO speaker, author, and the founder of Orainti, a boutique SEO consultancy advising top brands worldwide. She shares the latest SEO news and resources in her SEOFOMO newsletter, SEO tips in the Crawling Mondays video series, and a free SEO Learning Roadmap called LearningSEO.io.Twitter | Linkedin