llms.txt Are Live: The Web, Made Readable for Machines
The llms.txt repository is now live.
SN33 has processed the first batch with over 10,000 websites crawled, cleaned, and converted into structured llms.txt files by the subnet.
Semantic summaries ready for any LLM agent, MCP server, or AI app to consume instantly. No scraping. No parsing raw HTML. Just clean, machine-readable intelligence.
New batches will be pushed as the subnet keeps processing. The repo grows every week.
What's in the Dataset
- → Structured semantic summaries per domain
- → Named entities: people, orgs, products, technologies, concepts
- → Topic classification and key themes
- → Deterministic O(1) lookup by domain with no index file needed
- → Git-friendly structure that scales to millions of domains
This initial release covers ~10,000 domains, but the pipeline scales to millions.
Roadmap
10K → 100K → 1M domains → continuous updates from new Common Crawl releases and soon from requests.
What's Coming Next
The frontend is coming. Any domain — you request it, the subnet processes it, you get an llms.txt back. We're putting the finishing touches on the public UI.
SN33 is becoming infrastructure. The web, made readable for machines and open to anyone, powered by decentralized compute.
Star the repo, share it, and stay close. The next drop is right around the corner.
View on GitHub