Motivation: economists are increasingly using weather data to estimate impacts and climate models to estimate the implications of those impacts. Unfortunately, sometimes they make mistakes.

Today we'll talk about:

  • Weather data: sources and pitfalls.
  • Climate Models: the (very basic) workings and common mistakes.

There will be NO equations.


How can a paper with no model, no equations, and no identification (and nearly no economics or policy) get published in REEP?

  • It is very useful.
  • It is very clear.
  • The analysis is enough to make the points.
  • It is going to be cited many times (127 so far).

This paper is a manual on how to use weather data and climate models properly in economic research. It should improve the quality of work in that area and make life easier for many researchers.

Weather vs. Climate

  • Weather: how much did it rain today?
  • Climate: average long-term rainfall at a given location.
  • Under these definitions, other moments of the distribution aren't included.

Weather Data

Spatial interpolation vs reanalysis

Spatial interpolation estimates the weather at grid cells using daily weather station data, elevation wind direction, rain shadows, and "many other" features. Think of predictions from an econometric model.

Data assimilation/ reanalysis uses a physics-based model, which increases the extent of information where observations are sparse. Think of predictions from a structural model.

Five Key Pitfalls

  • Averages are similar between products, but variances vary.
  • Averaging non-missing station data creates measurement error.
  • Correlation across space of weather variables varies substantially.
  • Products often display significant spatial correlation, which can lead to multicollinearity.
  • Since weather stations come in and out of existence, there may be artifical breakpoints.

Pitfall 1: Choosing a dataset

Datasets agree on climate, but not weather. Usually weather is what we're identifying on.

Dataset Method Resolution Frequency Min Max Mean Precipitation
CRU Interpolated 0.5x0.5 Monthly Y Y N Y
UDEL Interpolated 0.5x0.5 Monthly N N Y Y
NCEP/NCAR Reanalysis nonuniform Daily Y Y N Y

Other options: EMCWF (reanalysis), PRISM (interpolated, US only)

Pitfall 1: Choosing a dataset

  • Averaging by country over time, the datasets match very closely (correlations 0.98-0.99 for temperature and 0.88-0.98 for precipitation). The data agree on which places are generally hot and which are generally cold.
  • Correlations between dataset estimates of annual deviations from the country mean is lower (0.72-0.92 for temperature, 0.26 to 0.70 for precipitation). Even CRU and UDEL, which both use statistical interpolation of station data, drop down to the 0.7 range.
  • Looking at the grid-cell level, correlations are much higher in the U.S., where there are many stations, than elsewhere.

Implication: measurement error can lead to attenuation bias, so FE methods in particular could suffer. Use multiple datasets as a robustness check.

Pitfall 2: Averaging station-level data across space

Spatial averages jump when stations go on or offline.

When location and time fixed effects are included, this can account for much (and even most) of the variation.

Possible alternative: fill in missing weather station data by regressing it on closest surrounding stations and predicting missing observations.

Pitfall 3: Variable Correlation

Issue: precipitation and temperature are correlated.

Solution: include all the variables.

Pitfall 4: Spatial Correlation

Issue: spatial correlation could lead to biased standard errors.