Research Agenda: Evidence, Decisions, and the Conditions That Connect Them
My research is about the relationship between evidence and decisions: specifically, the conditions under which data use in public institutions measurably affects outcomes, and the conditions under which it does not.
This is not primarily a data science question. It is an empirical social science question that data scientists are unusually poorly positioned to answer honestly, because they have strong professional incentives to overstate the influence of their work.
I have spent twenty years doing exactly the kind of work I want to study: building and evaluating complex behavioral models whose outputs are supposed to shape how billions of dollars in infrastructure gets allocated. My honest experience is that data often matters less than the professional norms of planning suggest, and that the moments when it does matter have identifiable structural properties worth understanding.
Thread 1: Does Data Use in Planning Measurably Affect Decisions?
Evidence-based medicine had to answer a prior question before it could become a field: did physicians’ use of evidence actually affect patient outcomes? The answer was not obvious and the literature was humbling.
Planning has largely skipped this prior question. The field has invested heavily in data collection and analytical methods while producing relatively little rigorous research on whether those investments change what agencies decide to fund, build, or adopt.
I can identify very few clear instances in twenty years of work where data I produced definitively changed a decision that would otherwise have gone differently. I can identify many instances where data was used decoratively — to justify decisions already made, to satisfy regulatory requirements, or to signal analytical sophistication without substantive influence. That is not a cynical conclusion. It is an empirical observation.
Thread 2: Evidence-Based Planning
Evidence-based medicine developed systematic review as a tool for accumulating and synthesizing evidence across studies. The result was a body of knowledge that allowed practitioners to distinguish interventions with strong evidence from those that were theoretically plausible but empirically weak.
Planning has no equivalent infrastructure. My interest is in developing a framework that evaluates planning interventions along two dimensions: strength of evidence for meaningful population-level outcomes, and political feasibility. The distribution across these dimensions turns out to be analytically interesting: interventions with the strongest evidence (congestion pricing, parking pricing) tend to be politically hard. Interventions that are politically easier (fare-free transit) often have weaker or more uncertain evidence.
Thread 3: Forecast Accountability and Assumption Auditing
Long-range planning is built on forecasts. Institutions are generally good at producing them and much worse at returning to them. Forecast revisitation is institutionally awkward: it requires acknowledging error, it implicates specific analysts and models, and it is rarely funded.
My own AV scenario work from 2014–2015 provides a case study. Looking back now, it is possible to identify which assumptions aged well, which did not, and which entire dimensions of the problem were missing. The working-from-home case is a canonical example of correct behavioral prediction producing wrong system-level inference: the forecast that telework would reduce commute travel was largely correct. The forecast that this would reduce aggregate VMT was largely wrong, because it missed the inelasticity of auto travel in the U.S. built environment.
Thread 4: Measurement, Metrics, and What Gets Built
There is a well-developed theoretical literature on how metrics shape what gets optimized — Goodhart’s Law in economics, extensive work on performance management in public administration. What is less developed is the application of this literature to transportation planning, where metric choice has had documented material consequences for what gets built and who gets served.
Three cases: congestion as the organizing goal crowded out learning about land use and access as alternative levers. VMT reduction can be achieved on paper without improving lived experience. Mode share decline doesn’t distinguish improved mobility from economic exclusion.
The research question: what measurement approaches would better capture whether transportation systems are serving human needs?
Connecting Thread: Evaluation in Complex Behavioral Systems
The problems I have described are not specific to transportation planning. They are general properties of how institutions build and trust complex behavioral systems for consequential decisions.
AI is arriving at the same set of problems, at higher speed and higher stakes. My interest is not in becoming an AI researcher. It is in contributing a methodological perspective that is genuinely underrepresented in current AI evaluation discourse: the perspective of practitioners who have spent careers asking how you evaluate a system whose outputs shape high-stakes public decisions, under conditions where the system’s behavior cannot be fully understood from benchmark performance alone.
The travel demand modeling community learned this, sometimes painfully, over three decades. That class of problem — evaluation under changed conditions, for a system that shapes consequential public decisions — is the connective tissue across all four threads.