Coding Intelligence:
The Missing Training Data
We're building reasoning-rich training data for coding agents by extracting architectural knowledge from deep technical conversations at scale.
The Problem
Coding agents have a well-documented data problem. Research shows agent advancement is "fundamentally constrained by the scarcity of high-quality training data."
GPT-5 with OpenHands scores 65% on SWE-Bench Verified but drops to 21% on SWE-EVO — which tests sustained, multi-file reasoning. The kind that requires understanding why a system is designed a certain way, not just generating a function.
Our Thesis
PlanSearch (ICLR 2025 Spotlight) demonstrated that reasoning in natural language before generating code dramatically improves output. If natural language reasoning improves coding performance, and the field is starved for reasoning-rich training data — where does that data come from?
Deep technical conversations.
Podcasts, architecture discussions, long-form developer dialogue. This is where engineers reason through tradeoffs, explain system design, and work through complex problems out loud. It's a rich, largely untapped source of exactly the signal the research says is missing.
What We're Building
We've built the infrastructure to extract this data at scale on Subnet 33. Our enrichment pipeline pulls architectural reasoning and contextual knowledge from conversational data — the kind of signal that helps models understand not just what to code, but why.
Reasoning-rich coding data extracted from thousands of technical conversations
Improvements on the hardest coding evaluations — SWE-Bench, SWE-EVO, and more
Early results are promising. We'll be sharing benchmark improvements on some of the hardest coding evaluations soon.