Tap for Details

Evidence Aggregation and External Validity

external_validity_lg_icon_crp
Network Conveners: Ed Vytlacil and Aprajit Mahajan

Social science research often aspires to be a forward-looking guide to policy.  We might hope for the expansion of programs that research shows to be promising, along with the cessation or modification of programs that research shows to be ineffective.  Unfortunately, the results from an internally valid trial, i.e. valid for the sample of people involved in the research, may not predict the effects on the population eventually served by a program.  The question of external validity, how a program works outside the context of existing evaluations, complicates decisions to scale up or even continue programs that appear promising.

Social norms, institutions, and even the weather can mediate the effects of a program, causing those effects to vary by location, times, and the characteristics of beneficiaries.  In practice, the results of randomized evaluations of the same intervention vary substantially across trials (Vivalt, 2015). Even within the same location, causal effects also vary due to random variation in conditions over time (Rosenzweig and Udry, 2016). In addition, there is clear evidence of selection bias on factors such as location in existing randomized evaluations (Alcott, 2015). Replication studies and subsequent meta-analyses will be useful to aggregate results from different contexts. Evidence from multiple evaluations can help build theory as to why some experiments succeed and others fail, while also offering predictions that could be tested in future experiments (Acemoglu, 2010; Banerjee et al., 2016).

Y-RISE supports research that helps to determine how much we can generalize existing evaluations. There is a need to develop sound statistical procedures for quantifying our uncertainty about impacts of programs implemented in a new set of circumstances. Developing standard measures of robustness to changes in population, location, or time, methods for adjusting confidence intervals and for bounding bias created by these problems would all be valuable.

Relevant Papers