Coding Intelligence

The Problem

Coding agents have a well-documented data problem. Research shows agent advancement is "fundamentally constrained by the scarcity of high-quality training data."

GPT-5 with OpenHands scores 65% on SWE-Bench Verified but drops to 21% on SWE-EVO — which tests sustained, multi-file reasoning. The kind that requires understanding why a system is designed a certain way, not just generating a function.

Our Thesis

PlanSearch (ICLR 2025 Spotlight) demonstrated that reasoning in natural language before generating code dramatically improves output. If natural language reasoning improves coding performance, and the field is starved for reasoning-rich training data — where does that data come from?

Deep technical conversations.

Podcasts, architecture discussions, long-form developer dialogue. This is where engineers reason through tradeoffs, explain system design, and work through complex problems out loud. It's a rich, largely untapped source of exactly the signal the research says is missing.

What We're Building

We've built the infrastructure to extract this data at scale on Subnet 33. Our enrichment pipeline pulls architectural reasoning and contextual knowledge from conversational data — the kind of signal that helps models understand not just what to code, but why.

Dataset

Publishing shortly

Reasoning-rich coding data extracted from thousands of technical conversations

Benchmarks

Coming soon

Improvements on the hardest coding evaluations — SWE-Bench, SWE-EVO, and more

Early results are promising. We'll be sharing benchmark improvements on some of the hardest coding evaluations soon.

Get Early Access Read the Full Research