Exploring United Nations Refugee & Asylum Data With the untools Package

Joshua BrinksISciences, LLC 


  • The untools package provides functions to easily acquire and visualize UN refugee and asylum seeker time series data.
  • untools also provides convenience functions for fixing country name typos, removing stateless entries, and introducing common country codes.



Note: This vignette was updated August, 2020 to reflect changes to the UNHCR Data API, available data, and subsequent streamlining of the untools package.

The Office of the United Nations High Commissioner for Refugees (UNHCR) provides several data sets describing annual movements of populations of concern. These include asylum seekers, asylum application data, asylum application decisions, refugees, internally displaced persons (UNHCR and IDMC tracked), returned refugees, returned internally displaced persons, stateless persons, Palestinian refugees, Venezuelans displaced abroad, and other populations of concern. The UNHCR Refugee Data Finder web portal serves as a central hub for several data sets summarizing the aforementioned populations by year, month, gender, age, origin, and destination. The web portal is a fine exploratory tool, but can be cumbersome for research purposes. Today we will be summarizing and exploring the UNHCR Population data for the most common UNHCR dataset that tracks refugees and asylum seekers. This dataset consists of annual dyadic flows for all populations of concern between countries of origin (citizenship) and countries of destination (asylum/residency). Although the earliest years of record are 1951, exhibit caution when performing analysis and causal inference for years prior to 1990.

Getting Started

Installing untools

The Populations of Concern dataset can be acquired directly using the getUNref() function from the untools package. You can install the current release of untools from GitLab with the devtools package.

devtools::install_gitlab("/dante-sttr/untools", dependencies=TRUE)

Acquiring the Data

Load both untools and data.table for some light data wrangling. Then use getUNref() to download the most recent data from the UNHCR API.



In addition to the getUNref() function we demonstrate in this vignette, the dataset is available for download directly from the UNHCR Refugee Data Finder web portal. The Time Series dataset is one of the cleanest data sets provided by the UNHCR, however, it might be challenging for a beginning programmer. The untools package is designed to provide simplified tools for data acquisition, processing, and visualization for popular United Nations data sets.

year coo_id coo_name coo coo_iso coa_id coa_name coa coa_iso refugees asylum_seekers returned_refugees idps returned_idps stateless ooc vda
1951 262 Unknown UKN NULL 11 Australia AUL AUS 180000 0 0 0 0 0 0 NA
1951 262 Unknown UKN NULL 12 Austria AUS AUT 282000 0 0 0 0 0 0 NA
1951 262 Unknown UKN NULL 17 Belgium BEL BEL 55000 0 0 0 0 0 0 NA
1951 262 Unknown UKN NULL 33 Canada CAN CAN 168511 0 0 0 0 0 0 NA
1951 262 Unknown UKN NULL 50 Denmark DEN DNK 2000 0 0 0 0 0 0 NA

This is a fairly simple dataset, consisting of the year of observation (year), country of origin (coo, coo_name), destination/asylum country (coa, coa_name), and fields for the different population types (refugees, asylum_seekers, returned_refugees, idps, returned_idps, stateless, ooc, and vda). Previous iterations of this dataset included numerous special characters, unusual formatting, and other issues not conducive to programmatic research and analysis, however, the July 2020 revision of the UNHCR datasets has largely removed these concerns. Please refer to the getUNref helpfile for more information detailing the different populations of concern and additional fields not addressed in this vignette.

#>  [1] "year"              "coo_id"            "coo_name"          "coo"              
#>  [5] "coo_iso"           "coa_id"            "coa_name"          "coa"              
#>  [9] "coa_iso"           "refugees"          "asylum_seekers"    "returned_refugees"
#> [13] "idps"              "returned_idps"     "stateless"         "ooc"              
#> [17] "vda"

untools Functions for UNHCR Data

The prepUNref Function

The untools package provides several functions for processing and visualizing UNHCR data. The prepUNref() function will help process raw UNHCR time series data by converting to wide or long form, selecting years of specific interest, selecting populations of interest, and summing across groups. Using prepUNref() with no additional parameters will subset the data for only refugee and asylum_seekers, and convert the data from wide to long form that is more conducive to visualization and analysis.

year coo_id coo_name coo coo_iso coa_id coa_name coa coa_iso type persons
1951 262 Unknown UKN NULL 11 Australia AUL AUS refugees 180000
1951 262 Unknown UKN NULL 12 Austria AUS AUT refugees 282000
1951 262 Unknown UKN NULL 17 Belgium BEL BEL refugees 55000
1951 262 Unknown UKN NULL 33 Canada CAN CAN refugees 168511
1951 262 Unknown UKN NULL 50 Denmark DEN DNK refugees 2000

There are several records with Unknown as the country of origin. While these are not trivial, for this exploration we will focus on known dyadic flows between countries.

year coo_id coo_name coo coo_iso coa_id coa_name coa coa_iso type persons
1960 6 Angola ANG AGO 41 Dem. Rep. of the Congo COD COD refugees 150000
1961 161 Rwanda RWA RWA 16 Burundi BDI BDI refugees 30000
1961 6 Angola ANG AGO 41 Dem. Rep. of the Congo COD COD refugees 150000
1961 161 Rwanda RWA RWA 41 Dem. Rep. of the Congo COD COD refugees 53000
1961 161 Rwanda RWA RWA 186 United Rep. of Tanzania TAN TZA refugees 12000

By default, prepUNref() selects all years and all affected populations, but the user can specify populations and years of interest by using the groups and range options. For example, specifying groups = c('refugees') and range = c(2000,2017) will only process refugees between 2000 and 2017.

unhcr.ts.dante<-prepUNref(unhcr.ts, groups = c('refugees'), range = c(2000, 2017))

Lastly, prepUNref() provides 2 additional logical switches; wide and sum. By default, prepUNref() returns long data frames. This is most convenient for plotting and modeling, however, sometimes it’s interesting to explore data in wide form; especially time series data sets. Moreover, the sum_groups option will aggregate the totals across all specified groups. Lets use these 2 switches to look at the sum of Syrian refugee and asylum seeking out-flows to Germany between 2014-2017 using wide = TRUE.

unhcr.ts.dante<-prepUNref(unhcr.ts, groups = c('refugees', 'asylum_seekers'), range = c(2014, 2017), sum_groups=TRUE, wide=TRUE)
coo_id coo_name coo coo_iso coa_id coa_name coa coa_iso 2014 2015 2016 2017
185 Syrian Arab Rep.  SYR SYR 72 Germany GFR DEU 70585 197186 475649 567507

Static Grouped Flows

With more than 100,000 unique country-country-year records, outflows, inflows, and varying populations of interest, visualizing the UNHCR can be overwhelming. The untools packages provides multiple default plotting functions objects produced by the prepUNref() function. An easy launching point to investigate flows between countries are static barplots of dyadic flows in or out of a target country. Using plot() on an object produced by prepUNref() with sum_groups = TRUE will produce a barplot for the target country and the top 8 destination or origin countries. The user specifies the country of interest, a year of interest, and whether they want to view inflows (mode = 'in') or outflows (mode = 'out'). Let’s start by viewing asylum seeking inflows to the United States in 2013.

unhcr.ts.dante<-untools::prepUNref(unhcr.ts, groups = c('asylum_seekers', 'refugees'), sum_groups = TRUE, range = c(2012, 2017))
usa.in<-plot(unhcr.ts.dante, country = 'USA', mode = 'in', yr = c(2013, 2013))

Somewhat surprisingly, China tops the list, while Central America rounds out the rest of the top 5. By default, plot() will list up to 8 countries and will use the maximum year in the prepared dataset if no other year(s) are specified. We can view asylum seeking outflows from the Philippines in 2019 with a simple call.

unhcr.ts.dante<-untools::prepUNref(unhcr.ts, groups = c('asylum_seekers'), range = c(2012, 2019))
phl.out<-plot(unhcr.ts.dante, country = 'PHL', mode = 'out')

Stacked Static Flows by Population Type

Up until this point we’ve visualized cumulative migrant flows across all groups or a singular group, but it may be of interest to examine relative proportions of asylum seekers, refugees, and stateless persons. The untools package provides default plotting functions to visualize stacked bar charts of migrant inflows or outflows by groups. Let’s re-examine inflows of migrants to the USA in 2017, but this time include breakdowns by type. To maintain effected population breakdowns specify sum_groups = FALSE.

unhcr.stacked<-untools::prepUNref(unhcr.ts, groups = c('asylum_seekers', 'refugees','stateless'), range = c(2000, 2017), sum_groups = FALSE)
usa.stacked.in<-plot(unhcr.stacked, country = 'USA', mode = 'in')

Plotting Time Series

Although static plots of migrant flows are interesting, it’s often more illuminating to examing time series data for migrant inflows and outflows. The untools package also provides default plotting functions to visualize time series migrant flows for data frames produced with prepUNref() using sum_groups = TRUE. The default plotting function will produce a plot for all years present in the raw data using the 5 countries with the highest totals in the maximum year of the dataset. Let’s view annual cumulative refugee and asylum seeking in-flows to the USA from 2000-2017.

unhcr.ts.dante<-prepUNref(unhcr.ts, groups = c('asylum_seekers', 'refugees'), range = c(2000, 2017), sum_groups = TRUE)
usa.ts.in<-plot(unhcr.ts.dante, country = 'USA', mode = 'in')

Lastly, similar to the static default plotting functions, we can specify mode = 'out' to view outflows from a given country.

phl.ts.out<-plot(unhcr.ts.dante, country = 'PHL', mode = 'out')

Additional Meta-Data

Data Categories:

tabular, time series, dyadic

Add new comment

Plain text

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd>
  • No HTML tags allowed.
  • Web page addresses and email addresses turn into links automatically.