3/2/2017

Overview

Motivation: economists are increasingly using weather data to estimate impacts and climate models to estimate the implications of those impacts. Unfortunately, sometimes they make mistakes.

Today we'll talk about:

  • Weather data: sources and pitfalls.
  • Climate Models: the (very basic) workings and common mistakes.

There will be NO equations.

Contribution

How can a paper with no model, no equations, and no identification (and nearly no economics or policy) get published in REEP?

  • It is very useful.
  • It is very clear.
  • The analysis is enough to make the points.
  • It is going to be cited many times (127 so far).

This paper is a manual on how to use weather data and climate models properly in economic research. It should improve the quality of work in that area and make life easier for many researchers.

Weather vs. Climate

  • Weather: how much did it rain today?
  • Climate: average long-term rainfall at a given location.
  • Under these definitions, other moments of the distribution aren't included.

Weather Data

Spatial interpolation vs reanalysis

Spatial interpolation estimates the weather at grid cells using daily weather station data, elevation wind direction, rain shadows, and "many other" features. Think of predictions from an econometric model.

Data assimilation/ reanalysis uses a physics-based model, which increases the extent of information where observations are sparse. Think of predictions from a structural model.

Five Key Pitfalls

  • Averages are similar between products, but variances vary.
  • Averaging non-missing station data creates measurement error.
  • Correlation across space of weather variables varies substantially.
  • Products often display significant spatial correlation, which can lead to multicollinearity.
  • Since weather stations come in and out of existence, there may be artifical breakpoints.

Pitfall 1: Choosing a dataset

Datasets agree on climate, but not weather. Usually weather is what we're identifying on.

Dataset Method Resolution Frequency Min Max Mean Precipitation
CRU Interpolated 0.5x0.5 Monthly Y Y N Y
UDEL Interpolated 0.5x0.5 Monthly N N Y Y
NCEP/NCAR Reanalysis nonuniform Daily Y Y N Y

Other options: EMCWF (reanalysis), PRISM (interpolated, US only)

Pitfall 1: Choosing a dataset

  • Averaging by country over time, the datasets match very closely (correlations 0.98-0.99 for temperature and 0.88-0.98 for precipitation). The data agree on which places are generally hot and which are generally cold.
  • Correlations between dataset estimates of annual deviations from the country mean is lower (0.72-0.92 for temperature, 0.26 to 0.70 for precipitation). Even CRU and UDEL, which both use statistical interpolation of station data, drop down to the 0.7 range.
  • Looking at the grid-cell level, correlations are much higher in the U.S., where there are many stations, than elsewhere.

Implication: measurement error can lead to attenuation bias, so FE methods in particular could suffer. Use multiple datasets as a robustness check.

Pitfall 2: Averaging station-level data across space

Spatial averages jump when stations go on or offline.

When location and time fixed effects are included, this can account for much (and even most) of the variation.

Possible alternative: fill in missing weather station data by regressing it on closest surrounding stations and predicting missing observations.

Pitfall 3: Variable Correlation

Issue: precipitation and temperature are correlated.

Solution: include all the variables.

Pitfall 4: Spatial Correlation

Issue: spatial correlation could lead to biased standard errors.

Pitfall 4: Spatial Correlation - Solution

Adjust for spatial correlation to remove bias:

  1. Use a spatial weighting matrix.
  2. Use Conley's (1999) nonparametric approach.
  3. Use grouped bootstrap with resampling and replacement of years. This requires year-to-year fluctations to be random (which is inconsistent with things like El Niño).

In data-sparse regions, several grids may be linked to the same stations, exacerbating multicollinearity, because they will only differ due to station weighting. Check for this.

Pitfall 5: Endogenous Coverage

Measurement error may be correlated with the thing we are trying to explain.

Example: Romania's data resolution was increasing until 1988. Then the iron curtain fell and it dropped precipitously. Studying how farmers respond to weather shocks before/ after the fall would be dangerous.

Also notably true for natural disasters, which may themselves destroy weather stations.

Climate Models and Their Output

What would we use a climate model for?

In many cases, once some sort of relationship between historical climate and some outcome is found, we would like to project how that may matter.

How do GCMs work?

The approximate the atmosphere and ocean (fluids) with numbers, and then use fluid mechanics models to predict their change over time. Using an iterative approach, they gradually project into the future.

Vegetation responses, cloud formation, rainfall, and many other processes are involved and can be modeled differently. Picking the "best" model is complicated. Consult experts.

Human activity is exogenous. So there's no adaptation.

Here's a simple tutorial, and a full GCM.

How different are GCM results?

Burke et al. (2011) find that half of economic studies use the Hadley model. Many used only the Hadley model. There's no reason to prefer that model, and models vary substantially.

Potential solution: model averaging, or multiple models. However, variances (or individual results) should be reported as well as averages.

Aggregation bias

GCM's divide the earth's surface into a grid, but climate statistics are homogenous across each 2x2 degree grid cell (about 138 miles at the equator).

Bias is most significant in mountainous or coastal areas (where climate changes a lot in a small space).

Using this to study the economic impact of climate change is problematic when effects are nonlinear, but there's a solution…

Aggregation bias in pictures

How to deal with aggregation bias

Two options:

  1. "Downscale" the GCM predictions by using regressions to correlate the historic data to the GCM grid. Datasets are available for California, nationally, and globally.

  2. Simply add the predicted change in the GCM to the baseline climate. However, this will miss any changes in variance across space.

Conclusions

Gridded datasets are great, but all have errors. Use more than one and properly account for station birth/death biases.

Global climate models are also great, but can lead to incorrect estimated impacts. Account for their location-specific biases.