Debunking LLMs.txt Myths: What You Need to Know for AI Visibility

Nov 11, 2025
11 min read

LLMs.txt has been widely discussed around generative engine optimization (GEO) and SEO circles with a surprising amount of vigor. The core discussion is whether the technique is demonstrably valuable to growing AI visibility, or not.

In this article, I'll explore whether LLMs.txt is being surfaced in classic search and AI, and its potential GEO value.

Opinions abound, and myths are everywhere, so let's break some.

Myth: LLMs.txt does nothing for "generative engine optimization"

Let’s start with the elephant in the room: Can LLMs.txt boost your AI visibility?

In order to answer the question about whether LLMs.txt is a good GEO strategy, I did a lot of research. I reviewed over 1,400 LLMs.txt files and poured over documentation on Google, OpenAI, Perplexity, and others. I’ve scoured Github repos and manually reviewed search results and AI responses for dozens of queries.

And in November 2025, the truth is that there is demonstrable evidence that LLMs.txt are being surfaced on a number of AI and search platforms including:

Indexed in Google search results
Ranking in Google
Google AI Mode
AI search results for ChatGPT and Perplexity

There’s also data to illustrate that LLMs.txt files are being used widely around the web because they can require a fraction of the tokenization cost of a standard webpage, making them well suited to the agentic era of the web.

To reduce that to nothing would be inaccurate.

Wix Studio ad with "AI tools for AI search" text on a gradient background. Button reads "Try it now."

Myth: LLMs.txt isn’t indexed in Google results

Yes, LLMs.txt can be indexed in Google results. Evidence shows that Google is indexing (and therefore crawling) LLMs.txt.

As of October 2025, Google indexed between 30,000 and 60,000 LLMs.txt globally. Testing Google Advanced Search over two weeks (using site:*/llms.txt), my research showed that the number of URLs fluctuates within this range.

Google screenshot from October 16, 2025

With a few exceptions, the vast majority of results are valid LLMs.txt markdown pages. This directly contradicts reports that “Google does not crawl LLMs.txt” as circulated in July 2025 and repeated across SEO web forums like Reddit.

Pages cannot be indexed without being crawled, meaning that these pages are not being ignored by Google. Those who have implemented LLMs.txt can observe activity on these pages in Google Search Console to see how and when they are being crawled.

Let’s be clear, the intended audience for LLMs.txt is LLMs, not classic search engines, but the widespread misinformation that Google does not crawl LLMs.txt or that it is completely ignored is demonstrably false.

Myth: LLMs.txt pages do not rank in classic search

LLMs.txt pages can rank. Bulk analysis of 586 indexed LLMs.txt files found that almost 6% of indexable LLMs.txt pages were ranking for organic keywords. These included verbatim keywords “quoted” in the text but also general keywords with branded terms that are surfaced semantically for relevant content.

Table showing a list of website targets with columns for mode, Ahrefs ratings, and organic search data. Contains stats and blue text items.

The presence of LLMs.txt in semantic keyword ranking suggests that the content is being crawled, indexed, and contextually understood by Google’s web crawlers. These same crawlers are used to provide grounded search results to support Google Gemini and AI Mode when generating AI responses.

Flowchart showing how Gemini processes a user's question about the Euro 2024 winner, searches online, and finds Spain beat England 2-1. — How grounding with Google search works

This means that an LLMs.txt file that's surfaced in search results could potentially also be used in an LLM response as part of this search-based RAG retrieval process on web-enabled AI search. Additional research is needed, but in the case where LLMs.txt ranks highly for general keywords, it seems most likely to occur when relevant information is easiest to access via the LLMs.txt file.

For instance, despite the fact that an exact match or verbatim search for “inngest docs” does not surface the respective LLMs.txt file, a semantic search for Inngest docs returns the LLMs.txt on the first page of Google.

Google search results for "inngest docs" showing links to documentation and resources like "Next.js Quick Start" and "API reference."

There’s also the case of Redbus, where the LLMs.txt ranks in 8th position for the keyword redbus customer care email id, which has around 900 monthly searches. While the LLMs.txt for this site doesn’t have a valid verbatim mention, the LLMs.txt ranks organically and even returned a featured snippet in one test, with a jump to text link.

Search results for "redbus customer care email id" showing contact info: phone, email (support@redbus.com), and live chat options.

In both cases, the other Google search results for the queries do not have the same search snippets, suggesting that the content shown at the LLMs.txt is unique, potentially useful to users (as well as AI agents), and thus is worth ranking. Data from Ahrefs shows that, in the case of Redbus, the LLMs.txt page receives clicks as well as providing additional signals on user value.

The presence of this content in search indicates additional value signals to LLMs. Content that's regularly shown in search and receives clicks is seen as more valuable for AI search platforms like AI Mode, ChatGPT, and Perplexity making it more likely to be included in a RAG pipeline as well as ranking on search.

Myth: LLMs.txt pages can’t show in AI Mode, ChatGPT, and Perplexity

There’s evidence that LLMs.txt can surface as part of answers in AI-first channels like AI Mode, ChatGPT, and Perplexity. In these cases, the visibility of LLMs.txt can vary, with LLMs.txt surfacing seemingly as a result of RAG but also as an independent crawl. Some argue that accessing LLMs.txt via RAG is not the intended use, but the original documentation explains that the aim is merely to “provide LLM-friendly content.”

Truth: LLMs.txt can show in AI search likely via RAG

In this AI Mode example, LLMs.txt showed as the top result for the redbus customer care email id. Google stated that AI Mode was created to “dive deeper into the web than a traditional search on Google” by drawing from search results. So, the fact that a ranking page that could potentially satisfy the query is being served here should come as no surprise.

Search results show redBus customer care details: email, phone support, grievance officer contact, and live chat options on a website. — LLMs.txt showing in AI Mode

In ChatGPT and Perplexity, Redbus was also included as part of the results for this query, suggesting that each of these answers was grounded with the same Google search results.

Text detailing redBus customer care emails and phone numbers for India, displayed with a citations panel highlighting source documents.

We see similar behavior for the “inngest docs” query. Since the LLMs.txt ranks in classic search, the file also shows up in Perplexity’s Google search informed sources for the same query.

As mentioned earlier, this page ranks organically on page 1 for a good volume search query, so my hypothesis is that the consistency of visibility across AI surfaces is influenced by RAG. Meaning that since it ranks organically, search dependent AI systems are also picking up this file.

For those who are looking to optimize for AI via LLMs.txt, this suggests that it's worth thinking strategically about the text and information that's included in the file. Guidance from the developers of the protocol recommends that the file should be “human and LLM readable,” and this evidence suggests that RAG optimization could lean into the “human readable” sections of this content.

Truth: LLMs.txt can surface independently of RAG

There’s evidence that Perplexity may be surfacing LLMs.txt independently of RAG.

In the case of weather.com, the contents of the LLMs.txt were not discoverable by search. When I carried out a verbatim search for a specific phrase in the file, Google returned zero website results.

Google search results page with no matches for a query about location-specific weather forecasts. White background, gray text box.

Despite this, when I asked the same question on Perplexity, it returned the LLMs.txt as the top source, as an in-line citation, and as the first page under “Steps.”

Text details a URL pattern for location-specific weather forecasts on weather.com, with highlighted sections and search query text.

Search query on webpage for location-specific forecasts URL pattern on weather.com, highlighting "llms.txt" source in orange.

This suggests that the file was not accidentally recalled but was used as a relevant resource. When I ask the same question on other platforms, they need to carry out a number of steps in order to arrive at the same conclusion. Perplexity was the most efficient. The aim is to become ever more efficient to reduce the amount of compute required to arrive at an answer, so it makes sense that as tools become more efficient they will be more likely to use these files.

Myth: LLMs.txt are the same as every other page on the web

There's some skepticism that LLMs.txt is treated the same as any other page. Some ask why it needs to be at the root of the domain and why it needs to be called LLMs.txt, and not literally anything else. These are perfectly valid questions. Let's address them.

Truth: The TXT file format is efficient for LLMs

Technically speaking, LLMs.txt are txt files in markdown, and markdown is not new. People have created text files in markdown for years.

But txt files are valuable to LLMs because they are computationally efficient. At the moment, LLMs need to scrape the raw html of a webpage to get to the language they need. But raw HTML includes HTML, CSS, or Javascript. So, when they do this, they end up tokenizing lots of code they don’t need.

Text files only contain (you might have guessed) text, thus making tokenization more efficient. These files help LLMs get to what they need with few tokens, using fewer resources and costing less money.

Don’t believe me? Here’s an experiment you can try for yourself. I went to OpenAI’s Tokenizer tool and compared the token the LLMs.txt for llmstxt.org with the raw HTML for the homepage. The LLMs.txt required 95x fewer tokens than the fairly simple homepage.

Side-by-side text comparison of tokenized LLMS.txt vs. raw HTML. HTML has more tokens and characters. Notable colors highlight syntax.

The difference in computational cost is one of the reasons LLMs.txt markdown is being used so heavily with agent-to-agent architecture and why many developers are adding extensive use of markdown to their sites.

Even if that's the case, you might wonder, why can't you use any markdown file? Well, you could. But within the protocol, there’s clear value in establishing a convention for where those files are located. Which brings me to my next point…

Truth: Naming conventions matter for web protocols

Every proposed and existing web protocol has rules, syntax, and conventions; you might even call them standards. Naming is a critical part of this. Imagine if someone put the same content at robotic.txt instead of robots.txt—it wouldn’t have the same impact because no one would go looking for it there. In the same way, the location of the LLMs.txt file is part of the value of the proposed protocol.

Every time a human user, web crawler, LLM, or AI agent views an LLMs.txt file they will expect to see a markdown file that has been formatted in a similar structure, with “LLM friendly content.” In some cases, like Gemini’s Developer API Docs, the file is used as a directory for other md.txt files, but the LLMs.txt file is the masthead for this content discovery.

Text document listing Gemini Developer API Docs, featuring links to resources and guides for Google AI Studio and Gemini API functionalities.

Yeah, it's true that LLMs.txt could have been called something else. But it wasn’t. There are now more than 60k files around the web (indexed and unindexed) with this name. That matters.

Myth: LLMs.txt always sits at the root of a domain

It's objectively not true that the LLMs.txt must sit at the root of the domain. It’s certainly the most common location of the file, but in my research over 1400 LLMs.txt files, I found that:

Only 62% of LLMs.txt files are located at the root of a domain
10% of LLMs.txt files are in subfolders, and
28% of LLMs.txt files are on subdomains

Pie chart of LLMs.txt location distribution: 62% root domain (blue), 28.1% subdomains (green), 9.9% subfolders (orange).

LLMs.txt is commonly used to supplement documentation, so it tends to sit alongside these folders. Anthropic advises developers building for agentic use that “LLM-friendly documentation can commonly be found in flat llms.txt files on official documentation sites.”

And in some cases, like for Wix MCP partner Stripe, the LLMs.txt file is linked from the footer of the documentation page to make it easy for humans and bots to find.

Browser window showing docs.stripe.com with links to Contact Support, changelog, Contact Sales, and an orange box highlighting Read llms.txt.

Myth: LLMs.txt adoption is driven solely by visibility optimizations

There are thousands of LLMs.txt files being used around the web. Some people are using them in order to gain visibility, but there are a multitude of ways that LLMs.txt can add value to your website with regards to how LLMs and AI tools access your content. This versatility is key to why it's being adopted. Contributing factors include:

User demand
LLMs.txt generators
Agent-to-agent architecture

Truth: Users demand is driving adoption

Web users and SEOs around the globe are fascinated by the idea of LLMs.txt. Google searches for the term “llms.txt” hit around 90k in August as Wix rolled out its automatic LLMs.txt generator for eCommerce sites. That’s contributed to a 94% increase compared to the previous three months. And there are over 26,000 Reddit results on Google for the topic, suggesting genuine human interest in this tool.

Line graph titled LLMS.TXT shows search volume over 12 months. Peak in August. 38K searches last month. 94% increase this quarter.

This customer interest is something that has driven my team to roll out LLMs.txt for our Wix and Wix Studio users. This scale of interest also suggests that if you have not implemented LLMs.txt that your competitors may have.

Truth: LLMs.txt generators make adoption more uniform and lower lift

Unlike previous rollouts of web protocols like robots.txt, originally introduced in 1994, the rollout of LLMs.txt is being implemented systematically through generator tools and native integrations.

Native CMS integrations like Wix’s automatic LLMs.txt generation, adds the file directly onto the root of the parent domain, and uses data from the site to create the file. Updates to the LLMs.txt align with updates to the sitemap and product inventory on the domain and are refreshed every 24 hours.

LLMs.txt generators can also be found in many AI search visibility tools. XFunnel for instance, has a built-in generator that crawls your website and generates a file for manual upload. As founders Neri Bluman and Beeri Amiel say in a blog post, “if we want AI to use our content effectively, we should feed it in a simpler, more concentrated form.”

With these automated tools, the effort needed to create an LLMs.txt file is reduced to the click of a button.

Compared to other early stage proposed protocols, the implementation of LLMs.txt is more consistently delivered across domains than many others. For instance, despite being a recognized and supported protocol, a survey of sites with security.txt shows implementation via txt file and some via HTML. It's promising that LLMs.txt gained mainstream support for implementation and creation so quickly after being developed.

Truth: The agentic web uses LLMs.txt

Some of the most consistent uptake of LLMs.txt comes from the teams behind agent-to-agent tools (A2A). Building for an agentic future, teams at Google, AWS, Anthropic, Perplexity, Microsoft, and OpenAI have all implemented LLMs.txt specifically for the purpose of improving agent-to-agent communication. Anthropic recommends using LLMs.txt in their Writing for Agents guidance.

Text excerpt explaining prototyping tools, featuring bold headings and hyperlinks, formatted on a plain background, including "11ms.txt".

OpenAI has an LLMs.txt for their Agents SDK and for their Agentic Commerce Protocol.

OpenAI Agents SDK documentation page, featuring an introduction to the SDK, multi-agent workflows, installation, and links for more details.

Gartner reported that 64% of technology executives are planning to deploy agentic AI over the next 24 months, which means optimizing for agentic use of websites will become increasingly more important.

Cost of tokenization is critical for agentic users. For example, Gemini’s 2.5 Flash model costs $0.10 per 1000 tokens. Drawing on my earlier example of llmstxt.org, the LLMs.txt could be processed 286 times for the same price of one html homepage tokenization.

LLMs.txt vs Homepage Tokenization Cost

A review of llmstxt.org

File Type	Tokens	Cost per Tokenization (USD)	Cost per 1000 Tokens (USD)	Tokenizations per $1
llms.txt	158	$0.0000158	$0.10	63,291
Homepage HTML	15,085	$0.0045255	$0.10	221

The agents will bear the cost of accessing a website, and if the tokenization cost is too high, they may choose another site. This suggests that LLMs.txt could be crucial for ensuring that websites are part of the agentic web of the future.

Truth: There isn’t formal “support” for LLMs.txt

Despite this agentic use, aside from Anthropic’s developer, there is very little in the way of formal “support” for LLMs.txt. As I mentioned, all the major players in AI search are actively using LLMs.txt. Google, OpenAI, Microsoft, Nvidia, AWS, and Anthropic all use LLMs.txt on their sites. Colloquially, if you use something, you kind of support it.

But “support” in the case of a major SaaS team, means things like providing knowledge base articles, speaking on best practices, and maintaining systems that update with the tool. That’s a lot of investment for a newly proposed protocol that’s a year old. That could change.

LLMs.txt is often, I would argue incorrectly, compared to robots.txt in function. But they’re similar in that the robots exclusion protocol took a while to become official, too. There was support and use from Google, but it was proposed by Martijn Koster in 1994 and didn’t become a formal protocol until 2022, after three years of review. That’s 26 years.

LLMs.txt has been around for just over a year and has more indexed files than security.txt and app-ads.txt, which are both supported.

Bar chart showing "Pages indexed in Google by txt file" with robots.txt highest at 450K, sitemap.txt at 300K, others under 50K.

So does LLMs.txt help with generative engine optimization?

It depends.

If after reading all of this you come to the conclusion that LLMs.txt does nothing for GEO or AI visibility, then you’ve missed the scale of the opportunity. At the same time, let’s be clear, this won't make or break your GEO strategy.

But it is true that LLMs.txt can be a valuable part of your overall efforts because:

Indexable LLMs.txt can be accessed and used in LLMs via RAG
There's evidence of LLMs.txt being accessed independently of RAG
LLMs.txt makes your site more accessible to the agentic web of the future

My advice? Test, iterate, and listen to your human and AI users.

7 LLMs.txt myths we should clear up