Apr 7, 2026

AI, Tacit Knowledge, and the Data We're Not Collecting

Yesterday I went to Stanford for the first time: arboretum-ringed, immaculately landscaped, car infrastructure dressed up as multimodal, and an audience curated as carefully as the landscaping. It struck me as oddly similar to the technology being discussed inside its walls. Both are trained on the world but insulated from it; both take the edges off reality to fit comfortable patterns.

My brilliant friend Flavia Tsang convinced me to step outside my comfort zone and join her at a Hoover Institution panel on what AI is doing to the workforce, featuring Erik Brynjolfsson, Karin Kimbrough, James Manyika, Gina Raimondo, and Rishi Sunak in conversation with Condoleezza Rice.

LLMs lack tacit domain knowledge, and there is a surprising amount of it

One great insight I had from the night: LLMs lack tacit domain knowledge, and there is a surprising amount of it.

In transportation planning, enormous expertise lives outside any document. How do you validate a travel demand model? How do you know if regional survey results are reasonable? How do you recognize that your results are wrong before you have published them? What are the best practices in logit model estimation that distinguish a careful analyst from a careless one?

Almost none of this is written down. It is passed from person to person through apprenticeship, war stories, and hard-won mistakes. AI cannot inherit what was never encoded.

Good public datasets are not a nice-to-have for equitable AI — they are the foundation

Another topic that directly connects to our work: good public datasets are not a nice-to-have for equitable AI; they are the foundation.

Household travel surveys in particular are chronically underfunded relative to their value: they are one of the few tools in planning that get close to causal understanding of how people actually move through the world, capturing complex decisions that no administrative dataset can touch. The panel raised the loss of federal funding to track critical data as a concrete institutional risk.

I see it as more than a risk — it is already happening. We are not collecting the data we need for societal benefit.