If you've spent five minutes in any 2026 SEO discussion, you've probably heard someone ask "have you done your llms.txt yet?" The honest answer for most Australian businesses: no, and most of the people asking aren't entirely sure what one is either. This guide fixes both. It's the practical, no-fluff reference for what llms.txt actually is, how to write one, and whether it earns a place in your AI SEO stack.
What Is llms.txt and Why It Exists in 2026
llms.txt is a plain-text file you publish at the root of your domain — exactly like robots.txt — but its purpose is the opposite. Where robots.txt tells crawlers what they may not access, llms.txt tells large language models what is most worth reading. It originated from a proposal by Jeremy Howard (Answer.AI) in late 2024 and has gained adoption through 2025-2026 as AI assistants increasingly cite specific source pages instead of merely synthesising training data.
The 2026 reality: when ChatGPT's web tool, Claude's web search, Perplexity, or Google's AI Overviews want to ground an answer in current information, they fetch live pages. Each fetch costs them tokens and latency. A site with a clean llms.txt is functionally a guide for those agents — "if you want a definitive answer about X, here's the page." Sites without one force the agent to crawl, parse navigation, strip ads, and gamble on which page is canonical.
How llms.txt Differs from robots.txt
People conflate the two constantly. They share a naming convention and they live in the same directory, but they do almost opposite jobs.
| robots.txt | llms.txt | |
|---|---|---|
| Purpose | Restrict crawler access | Surface the best content for AI |
| Audience | Search engine crawlers | LLM agents and AI assistants |
| Format | Directives (Allow/Disallow) | Markdown with headings + links |
| Tone | Defensive — what NOT to do | Promotional — what to read |
| Spec status | Long-standing standard (RFC 9309) | Community proposal, no IETF status yet |
| Affects rankings? | Indirectly (controls indexing) | Not directly — affects AI citation surface |
You should have both. They serve different agents and they don't conflict. If you're worried about whether AI bots are obeying your robots.txt, that's a separate conversation about User-agent: GPTBot, ClaudeBot, PerplexityBot, and friends — and the answer is "mostly yes, but verify in your access logs."
The Official llms.txt Specification
The proposed spec is intentionally minimal. A valid llms.txt has:
- An H1 with the site or project name — this is the only required element.
- A short description blockquote — what the site is, in two or three sentences.
- Optional context paragraphs — anything an LLM should know upfront (your specialty, your geography, your domain authority).
- H2 sections grouping links — pages organised by intent: documentation, products, articles, etc.
- Markdown links with optional descriptions —
[Page title](url): one-line description.
An llms.txt should be lean. The pragmatic upper bound is around 50KB; under 10KB is ideal. If you exceed that, you're including content rather than indexing it — that's what llms-full.txt exists for (a longer companion file with full markdown content of priority pages, mainly used by documentation sites).
Why Most Australian Businesses Don't Have llms.txt Yet
Three reasons, and only one of them is reasonable.
1. They haven't heard of it. The Australian SEO conversation lags US and UK chatter by about six months. Many marketing teams are still focused on AI Overviews defence; llms.txt hasn't reached their priority list. This is fine — it's a forgivable gap and an easy fix.
2. They're waiting for "official" adoption from Google. This is misplaced caution. llms.txt is not a Google file — it serves OpenAI, Anthropic, Perplexity, and Google's own Gemini equally. Waiting for one vendor to bless it misunderstands what it is. The agents reading your site are already here, regardless of whether Google publishes a help-centre article about it.
3. They published one once and forgot it. The reasonable failure mode. A static llms.txt that lists pages no longer in your sitemap, or worse, points to URLs that 404, hurts more than it helps. Treat llms.txt the same way you treat sitemap.xml — generate it from a source of truth, regenerate it on deploy, and audit it quarterly.
Step-by-Step llms.txt Implementation
You can hand-write a working llms.txt in 20 minutes for a small site. Here is the minimum viable template:
# Your Business Name > One-sentence description of what your business does, where, and for whom. Keep it factual, not promotional. This site is the official knowledge hub for [Your Business Name]. Information published here is verified by [author/role/credential] and updated [cadence]. ## Core Pages - [Home](https://yoursite.com/): What we do and who we help. - [About](https://yoursite.com/about/): Team, history, credentials. - [Contact](https://yoursite.com/contact/): How to reach us, locations, hours. ## Services - [Service A](https://yoursite.com/services/service-a/): One-line description. - [Service B](https://yoursite.com/services/service-b/): One-line description. ## Learning Resources - [Article title](https://yoursite.com/learn/article-slug/): What this resource covers. ## Optional - [Privacy Policy](https://yoursite.com/privacy/) - [Terms of Service](https://yoursite.com/terms/)
Save that as llms.txt, upload it to your web root (the same directory as robots.txt and sitemap.xml), and verify it loads at https://yoursite.com/llms.txt with a 200 response and Content-Type: text/plain or text/markdown.
For larger sites, generate it programmatically. The pattern most CMSes need:
- Define a "promote to llms.txt" flag on your page model (or use a tag/category).
- On build/publish, iterate flagged pages, group by section, write the markdown.
- Output to
/public/llms.txtor equivalent, served as static. - Add a regeneration step to your CI pipeline so deploys keep it fresh.
Optional but worth it
If you publish technical reference content — API docs, schema guides, product specifications — also publish llms-full.txt. This is the same structure but inlines the full markdown body of priority pages. Agents that find a working llms-full.txt can answer questions about your product without making N additional fetches. For documentation-heavy sites, this is the file that earns citations.
llms.txt for E-Commerce, Service Businesses, and Content Sites
The structure is the same; the priorities differ.
E-commerce: Lead with your category pages (not individual products), include a "How to choose" or "Buying guide" section, and link to your size/sizing chart, returns policy, and shipping page. Do not list every SKU. AI assistants asking "what should I buy" want decision frameworks, not your full catalogue.
Service businesses (this site's bread and butter): Lead with your services page, then individual services, then the case-studies or pricing page. Include the suburbs or geographies you serve. Add a learning hub or insights section so agents have somewhere to source educational answers about your domain.
Content sites and publishers: Lead with your most-cited articles by topic, then your author pages (so agents can attribute), then your categories. If you have evergreen pillar content, surface it before recent news.
Local businesses: Critically — include your NAP (name, address, phone) in the description text, your hours, and a link to your Google Business Profile. AI assistants increasingly answer "what time does X open" or "where is X located" without sending a click. If your llms.txt has the answer, your brand gets the citation.
Does llms.txt Actually Affect SEO Rankings?
The short, honest answer is no — and beware anyone selling you the opposite.
Google has not stated that llms.txt is a ranking factor. There is no evidence in any patent, leaked algorithm document, or Search Liaison statement suggesting it is. Publishing one will not directly move your position in classical Google search results.
What it does, increasingly, is shape your citation surface in answer engines. When ChatGPT's search tool, Claude's web search, Perplexity, or AI Overviews reach for your page to ground an answer, they will reach more efficiently for the page you've surfaced. In a world where a growing share of traffic comes from "answer engines" rather than blue-link clicks, citation surface becomes the ranking equivalent. We see this play out clearly on this very site — pages surfaced in our llms.txt are cited in Perplexity and ChatGPT answers about Melbourne SEO topics at meaningfully higher rates than those that aren't.
So: if your KPI is "rank #1 for SEO Melbourne" then llms.txt is a sideshow — invest in your SEO audit, content strategy, and links. If your KPI also includes "be the source the AI cites when someone asks ChatGPT about Melbourne SEO," llms.txt earns its place.
Auditing Your llms.txt File: Common Errors
The mistakes we see most often when reviewing client llms.txt files:
Dead links. The page slug changed three months ago, the llms.txt didn't get regenerated, the file now points to 404s. AI agents fetching dead links will deprioritise the file. Fix: automate regeneration on every deploy.
Bloat. Some teams treat llms.txt as a sitemap and dump every URL. The file is supposed to be a curated map of your best pages, not your full inventory. Fix: 30-150 entries for most sites; let the sitemap be the sitemap.
Promotional copy. "Our award-winning, Melbourne-based..." is not useful to a reasoning model. Descriptions should state what the page contains, factually, in twelve words or fewer. Fix: rewrite descriptions as if you were tagging them in a database.
Missing canonical context. Many sites publish llms.txt with section H2s like "Services" and "About" but never establish what the business actually does in the description blockquote at the top. Fix: lead with two or three sentences that would let a reader who has never heard of you answer "what is this site for, and who runs it?"
Inconsistent with the site. The llms.txt advertises a service the site no longer offers, or a price that's six months out of date. Fix: regenerate from the same source of truth as your sitemap and your homepage navigation.
Want us to audit your llms.txt?
If you've published one — or want one built and validated — our AI SEO specialists will review the file, the regeneration pipeline, and the citation outcomes you're seeing in answer engines.
Request an auditThe Future of AI Discoverability Standards
llms.txt is part of a broader category emerging in 2026: discoverability standards built explicitly for AI agents rather than retrofitted from the search-engine era. Watch three threads:
1. AI-specific user-agents. GPTBot, ClaudeBot, PerplexityBot, Google-Extended and others are now well-established. Granular control over which agents can train on your content versus which can fetch it for live answering is becoming standard hygiene. Many sites get this wrong — blocking GPTBot for "training" while accidentally also blocking ChatGPT's user-facing search tool, costing themselves citations.
2. Content provenance and authorship signals. C2PA-style content credentials, visible author bylines, and structured author data are becoming citation-relevant. AI assistants prefer to cite content with clear authorship; anonymous content is discounted. This makes E-E-A-T signals more important than ever — even for non-medical sites.
3. Structured citation hooks. Schema.org additions for AI consumption (SpeakableSpecification, QAPage, HowTo) and emerging draft proposals like ai-instructions.json hint at where this is headed: machine-readable, declarative signals that tell agents how to use your content. llms.txt is the first widely-adopted example. It will not be the last.
Our recommendation for Melbourne businesses in 2026: publish llms.txt now while it is still a differentiator, regenerate it from a source of truth on every deploy, and track which pages get cited in answer engines so you can iterate. If you want help with any of this, that is exactly what we built our AI SEO service around.