Replicating Missirian & Schenkler (2017): Introduction

Joshua BrinksISciences, LLC 

Highlights:

  • We attempted to replicate a recent high-impact environment-security study.
  • We successfully demonstrated their data pre-processing and visualization steps.
  • We were unable to recreate their statistical modeling efforts.

Abstract:

The ability to parse leading peer reviewed research, comprehend their underlying methodologies, and adapt them to personal needs is a major barrier to early-career scientists. This becomes more challenging due to publications with inadequate written methods plagued by poorly conveyed data processing procedures, and insufficient details regarding software and package use. These issues could be remedied if, for example, researchers provided their underlying code along with submissions. This approach would contribute to full transparency, and serve as a teaching mechanism for graduate students and early career scientists. In this presentation, we will demonstrate the use of custom R packages to promote reproducible and distributable research by presenting our efforts to replicate Missirian and Schenkler’s 2017 Asylum applications respond to temperature fluctuations; an investigation of agro-economic drivers of asylum applications to European Union member nations. Test

Keywords:

replication, spei, raster, gadm

Contents:

Introduction

This is the first entry in the DANTE Project’s efforts to demonstrate modern analytical techniques through replication, and develop a set of generalized R functions to make these analyses more accessible. Over the course of multiple vignettes, I will attempt to reverse engineer Missirian and Schenkler’s 2017 fantastic recent publication paper examining European asylum applications in response to temperature fluctuations in non-OECD countries during 2000–2014. This is a well received Nature paper, which according to Google Scholar has nearly 100 citations since its release. As with most Nature and PNAS submissions, a detailed supplementary materials section accompanies the truncated featured manuscript.

My intent is to make high impact research more accessible to researchers just starting, or intimidated by modern quantitative techniques. Moreover, I hope to demonstrate the difficulty in replicating modern publications even when presented with detailed supplementary materials. The methods and supplementary materials provided for this publication are detailed in comparison to many peer-reviewed publications. I started by thoroughly reviewing the methods and supplementary materials. After reviewing these materials and planning an analytical workflow to mimic the manuscript, I was left with several questions regarding their data processing and statistical modeling.

  1. Why was Monfreda and Ramunkutty cropping data used over more recent data from MapSPAM; presumably to maintain congruency with cropping calendar data that is not provided by MapSPAM?

  2. What software and methodology was used to extract raster data to ESRI/Garmin vector country boundaries? Zonal statistics methodologies vary widely in their handling of cells that are not entirely contained within the boundaries of the target polygon.

  3. Were interpolated planting and harvest data used? Several countries listed as source countries in the supplementary materials have no valid planting or harvest data. If interpolated data was used, were they validated in any way? The authors of the planting and harvest dataset specifically warn against using the interpolated planting data as it may contain wild inaccuracies.

  4. It’s not clear precisely how the weighted mean temperature was calculated. Was each temperature cell weighted by the “underlying” cropping fraction cell? The narrated portion of the methods leads this open to interpretation.

  5. If the temperature was weighted by the spatially corresponding cropping fraction cell, was the cropping fraction data aggregated to match the resolution of the temperature data? Mean surface temperature is 0.5 x 0.5 degree resolution, while the cropping fraction data is 5 arcmin. If the cropping fraction data was aggregated to match the resolution of the surface temperature raster, what method was used?

  6. Were cropping weights adjusted for cell area? Raster cell size is smaller as you move pole-ward. With samples ranging from Russia to South America this will have a large impact on a weighted zonal extraction.

  7. What was the specific parameterization of the top model presented in the primary manuscript, and what software or packages were used in its implementation? It’s clear the preferred model employed quadratic terms for mean temperature, but it’s not clear exactly what the remaining “country fixed-effects” were. These remaining effects are also not listed in the coefficient table provided in the supplementary material.

  8. Lastly, while significance levels for parameters were provided, what, if any, out of sample goodness of fit tests were carried out to test the suitability of the model. This is of greater importance, because a large portion of the written narrative focuses on predicting future levels of asylum applications under varying climate scenarios.

Although some of these points are only a matter of procedure that may have limited affect on the model inputs, differences in determining weighted mean surface temperature and final model specification can have profound downstream effects. I will attempt to replicate their core model with these considerations in mind. In doing so, I will walk the reader through the data processing steps to create the core quadratic temperature model. At this time, I will not demonstrate their sensitivity checks, which include the addition of cumulative precipitation data, using alternative climate data, the inclusion of conflict data, and future predictions under various climate scenarios. I will carry out this procedure in 3 steps: 1) data acquisition and pre-processing, 2) visual exploration of the processed data, 3) enacting the core model and diagnostics.

We’ll begin in the next section by preparing the data:

Part II: Data Processing

Additional Meta-Data

Data Categories:

raster, tabular, shapefile, dyadic

Add new comment

Plain text

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd>
  • No HTML tags allowed.
  • Web page addresses and email addresses turn into links automatically.