Apr 20, 2026

Drawing Lines of Evidence

From What We Can See to What We Can Prove

A project framework for systematic evidence on transportation interventions

Suzanne Childress | Unicairn | April 2026

Working draft for discussion. Not a proposal to any institution.

I. The Gap We Can See

Most people can recognize the difference between a community that works and one that doesn’t. A place where a teenager can get to school and a job without depending on a parent’s car. Where someone aging out of driving doesn’t lose access to medical care. Where a family can walk to a grocery store without crossing six lanes of traffic at an intersection designed for throughput rather than survival.

And most people can see the opposite: communities organized around automobile throughput at the expense of nearly everything else, where children cannot move independently, where transit is a last resort, where the people who bear the heaviest pollution and safety burden are the people with the fewest alternatives.

The gap between these two conditions is visible, describable, and measurable. But the field that is supposed to close it cannot reliably tell you which interventions actually move communities from one condition toward the other, how large the effects are, under what conditions they hold, or how confident you should be in any specific claim. We can see where we want to go. We cannot draw reliable lines of evidence from the dysfunctional present to the functional future.

This project is about building the capacity to draw those lines honestly.

II. Why This Is So Hard

I want to be direct about the difficulty, because understating it is one of the ways the field has gotten into trouble.

The problem of knowing whether transportation interventions work is approximately as hard as the central problems of empirical economics. Causality is confounded by self-selection (people who choose to live near transit are different from people who don’t), by induced demand (building capacity changes the behavior you’re trying to measure), by system reorganization (removing trips from a network frees capacity that gets consumed by other trips), and by land use feedback loops that operate over decades. Randomized experiments are almost never feasible. Natural experiments are rare and context-specific. Most of what we know comes from observational studies with varying rigor and limited generalizability.

Economics went through a painful reckoning with these problems. The credibility revolution (Angrist, Imbens, Card, the shift toward natural experiments and quasi-experimental designs) happened because the field got honest about the fact that its earlier causal claims were not well-supported by its methods. Transportation planning has not yet had that reckoning. The field continues to produce impact estimates with more apparent precision than the underlying evidence can support, and to treat model outputs as forecasts rather than structured assumptions.

Three features make transportation evidence especially treacherous:

The built environment is the constraint, and we built it over 200 years

American communities were physically reorganized around the automobile over the twentieth century. The resulting land use patterns (low density, separated uses, hierarchical street networks, massive parking supply) are not policy variables that can be adjusted at the margin. They are infrastructure with fifty-year lifespans and political constituencies organized to protect them. An intervention that works well in a pre-war urban grid may produce entirely different results in a postwar suburb, not because the evidence is wrong but because the system it operates in is different. Transferability is not a statistical nuisance. It is the central problem.

The most important effects are system-level, and our evidence is local

Most credible transportation research estimates the effect of an intervention on the people or trips directly affected. But the outcomes we care about (regional VMT, air quality, safety, equitable access) are system-level properties that depend on how local effects propagate through networks, land use, and behavior. The remote work case is canonical: the behavioral prediction was correct (workers with remote options took fewer commute trips), but the system-level inference was wrong (total VMT did not fall), because the models excluded what happened when peak capacity was freed. Induced demand from capacity expansion is the mirror image: well-estimated at the corridor level, poorly understood at the system level where the mechanism operates through long-run land use change.

Decisions are irreversible and evidence arrives late

A highway alignment shapes land use for half a century. A rail investment commits a corridor for longer. A parking minimum embedded in a zoning code constrains development patterns for decades. The decisions that matter most in transportation planning are the ones that are hardest to reverse, and the evidence on their effects accumulates slowly and after the fact. The cost of being wrong is asymmetric: the downside of a bad irreversible decision is categorically worse than the downside of a bad reversible one, and the evidentiary threshold should reflect that. It almost never does.

III. The Field’s Confirmation Problem

Transportation planning has a confirmation problem. Across the political spectrum, advocates and practitioners select the evidence that supports their preferred interventions and dismiss the evidence that complicates them. This is not a partisan observation. It is a structural feature of a field that lacks systematic evidence review and has strong professional incentives to appear more certain than the evidence warrants.

Urbanist advocates invoke induced demand as a universal principle (build roads and traffic fills them) while the underlying elasticity evidence comes predominantly from capacity additions on already-congested urban corridors and may not generalize to every context. The implied policy conclusion (never add road capacity) goes beyond what the evidence supports, even though the evidence is strong within its domain. Conversely, opponents of transit investment point at national ridership declines as proof that transit is obsolete, while those declines reflect decades of service cuts, fare increases, land use decisions that made transit uncompetitive, and a pandemic. Not a revealed preference against public transportation in any clean causal sense.

Both examples share the same structure: real evidence, deployed past its evidentiary warrant, in service of a conclusion arrived at on other grounds. Honest practitioners are left without a reliable guide. The loudest voices on any intervention are the ones with the least interest in characterizing uncertainty honestly.

More research alone does not fix this. What fixes it is infrastructure for systematic evidence review: a process that assembles the literature on specific interventions, grades it explicitly for quality and confidence, reports the full range of observed effects rather than cherry-picking point estimates, and is honest about what is not known. Medicine built this through the Cochrane Collaboration. Education built a version through the What Works Clearinghouse. Transportation planning has neither.

IV. Two Kinds of Uncertainty

The intellectual core of this project is a distinction that I believe is original to this work, or at least not systematically developed elsewhere in the transportation literature.

Standard evidence frameworks, including Cochrane GRADE, assess one kind of uncertainty: how precisely have we estimated a specific effect? How many studies exist, how strong are their designs, how consistent are their findings, how directly do they apply to the population of interest? This is parametric uncertainty, and GRADE handles it well.

But the most consequential errors in transportation planning are not miscalculated elasticities. They are model boundaries drawn in the wrong place. Outcomes measured that do not capture what planning is actually for. Causal models that work within a stable system but fail when the system reorganizes. This is structural uncertainty, and no existing evidence framework in the transportation field assesses it.

The two-axis framework assesses every intervention on both dimensions. Parametric confidence answers: given the available studies, how sure are we about this effect size? Structural uncertainty answers: how sure are we that this is the right effect to be estimating, that the causal model captures the mechanisms that matter for the decision at hand?

Remote work sits at one extreme: high parametric confidence (the behavioral effect is well-established), high structural uncertainty (the system-level inference was wrong because the model boundary excluded capacity reabsorption). Speed limit reductions sit at the other: high parametric confidence and low structural uncertainty, because the causal mechanism is direct and does not depend on system reorganization. Residential density and VMT sit in between: moderate parametric confidence (the correlation is well-documented but self-selection is hard to separate) and moderate-to-high structural uncertainty (the relationship is mediated by transit availability, land use mix, and demographic composition in ways that vary across contexts).

Getting this framework operational means defining the levels precisely enough that two researchers would apply them consistently, testing it against cases where the right ratings are arguable, and documenting the reasoning. The framework is only useful if disagreements about ratings are productive rather than arbitrary.

V. What the Project Produces

A systematic evidence review of transportation interventions

Roughly twenty-two interventions, each assessed using the two-axis framework: GRADE-style parametric confidence by outcome domain, and structural uncertainty. For each intervention: the observed range of effect sizes across the literature (not a single point estimate), known moderators, context-dependence, distributional considerations, and the three to five most important sources. The evidence cells are populated through the review process, not beforehand. The master table is the commitment; filling it in is the research.

Outcome domains: travel behavior, access and mobility, safety, environmental, equity and distribution, wellbeing and experience. Gaps in evidence across domains are findings, not omissions.

An effect-size reference organized for practitioners

By intervention, then outcome domain: observed range (25th to 75th percentile), median, number of studies, study contexts, dominant designs. This is the reference class resource that lets a practitioner anchor an estimate in the literature rather than in a model assumption or a vendor claim. It draws on both program evaluation evidence and estimated coefficients from travel demand models, which encode real behavioral knowledge that the evidence synthesis tradition has largely ignored.

A reference class estimation approach

A practical methodology for practitioners who need a defensible estimate without running a full travel demand model. Start from an observed base distribution of local travel behavior. Apply empirically grounded sensitivities from the effect-size reference. Document assumptions explicitly. Report the uncertainty range. The result is a structured prior, not a forecast, and it is transparent about the difference.

VI. On Scope

The project enters through transportation interventions, but the outcomes it cares about (access, safety, environmental quality, equity, wellbeing) are not contained within transportation. Housing policy, land use regulation, public health infrastructure, and economic development all shape the same outcomes through overlapping causal pathways. A parking minimum reform is simultaneously a transportation intervention, a housing policy, and an economic development decision.

I am not proposing to review all of urban planning. The interventions are transportation and land use interventions. But the outcomes framework deliberately reaches beyond transportation metrics, because the point is to assess whether interventions produce outcomes that matter for communities, not just outcomes that transportation models are set up to measure. Mode share is an intermediate metric. Whether a sixty-year-old who can no longer drive can still get to her doctor is an outcome. The project is scoped by intervention type but evaluated against community-level outcomes, and that tension is a feature of the design.

VII. What This Needs

This is not a side project. Building the evidence infrastructure that transportation planning lacks is a multi-year effort requiring sustained intellectual investment from people with real methodological depth.

The systematic reviews need researchers who can assess study design quality and distinguish between studies that contribute genuine evidence and studies that contribute volume. The structural uncertainty ratings need people with enough modeling experience to know where model boundaries are drawn and what they exclude. The effect-size reference needs people who can work across the program-evaluation and travel-demand-modeling traditions, which have operated in parallel without synthesizing their findings. The practitioner tools need people who understand how planners actually work and what kind of evidence product would be useful rather than decorative.

No single institution has all of this. The project will only succeed if it draws on multiple sources. The Zephyr Foundation is a natural convening platform. Federal research programs (NCHRP, FHWA) have funded analogous evidence infrastructure in other fields. The institutional form is an open question.

What I bring is twenty years of experience with the underlying data and analytical systems, the intellectual framework, and a willingness to spend years on this. What I am looking for is collaborators who are genuinely interested in the problem, have the methodological depth to make the work credible, and share the commitment to intellectual honesty. The project lives or dies on whether the people working on it are willing to rate an intervention honestly even when the rating is politically inconvenient.

VIII. Where to Start

The framework needs to be tested before it can be scaled. Four near-term priorities:

First, operationalize the structural uncertainty levels. Low, moderate, and high need criteria precise enough that two researchers would apply them consistently to the same intervention.

Second, draft two pilot intervention reviews end-to-end. Congestion pricing (deepest literature, most contested policy implications) and speed limit reductions (clean causal structure, instructive contrast with more complex interventions). Completing both would test whether the review template, the rating process, and the effect-size synthesis are workable.

Third, convene a small working group. Not a large committee. Five to eight people who can commit to reading the pilot reviews critically and pressure-testing the ratings. Zephyr is the natural venue.

Fourth, develop the public presentation. The evidence reviews, master table, and reference class tools need a freely accessible home designed for practitioners who need answers. That home is unicairn.com initially, with the expectation that the institutional form may evolve.

Transportation planning deserves the same quality of evidence infrastructure that medicine and education have built for themselves. The execution is hard, because the causal problems are hard and because the field’s habits of confirmation bias make honest assessment politically uncomfortable. But continuing to deploy evidence selectively in support of decisions already made is not a stable equilibrium. The practitioners I talk to know it, the researchers know it, and the decision-makers who have to defend their choices in public are starting to know it too.

The question is not whether this infrastructure is needed. The question is whether the field is ready to build it honestly. I think it is.

Suzanne Childress | childressssuzanne@gmail.com | unicairn.com

April 2026