Note: This vignette was updated August, 2020 to reflect changes to the UNHCR Data API, available data, and subsequent streamlining of the
The Office of the United Nations High Commissioner for Refugees (UNHCR) provides several data sets describing annual movements of populations of concern. These include asylum seekers, asylum application data, asylum application decisions, refugees, internally displaced persons (UNHCR and IDMC tracked), returned refugees, returned internally displaced persons, stateless persons, Palestinian refugees, Venezuelans displaced abroad, and other populations of concern. The UNHCR Refugee Data Finder web portal serves as a central hub for several data sets summarizing the aforementioned populations by year, month, gender, age, origin, and destination. The web portal is a fine exploratory tool, but can be cumbersome for research purposes. Today we will be summarizing and exploring the UNHCR Population data for the most common UNHCR dataset that tracks refugees and asylum seekers. This dataset consists of annual dyadic flows for all populations of concern between countries of origin (citizenship) and countries of destination (asylum/residency). Although the earliest years of record are 1951, exhibit caution when performing analysis and causal inference for years prior to 1990.
The Populations of Concern dataset can be acquired directly using the
getUNref() function from the
untools package. You can install the current release of
untools from GitLab with the
Acquiring the Data
data.table for some light data wrangling. Then use
getUNref() to download the most recent data from the UNHCR API.
In addition to the
getUNref() function we demonstrate in this vignette, the dataset is available for download directly from the UNHCR Refugee Data Finder web portal. The Time Series dataset is one of the cleanest data sets provided by the UNHCR, however, it might be challenging for a beginning programmer. The
untools package is designed to provide simplified tools for data acquisition, processing, and visualization for popular United Nations data sets.
This is a fairly simple dataset, consisting of the year of observation (
year), country of origin (
coo, coo_name), destination/asylum country (
coa, coa_name), and fields for the different population types (
vda). Previous iterations of this dataset included numerous special characters, unusual formatting, and other issues not conducive to programmatic research and analysis, however, the July 2020 revision of the UNHCR datasets has largely removed these concerns. Please refer to the
getUNref helpfile for more information detailing the different populations of concern and additional fields not addressed in this vignette.
untools Functions for UNHCR Data
The prepUNref Function
untools package provides several functions for processing and visualizing UNHCR data. The
prepUNref() function will help process raw UNHCR time series data by converting to wide or long form, selecting years of specific interest, selecting populations of interest, and summing across groups. Using
prepUNref() with no additional parameters will subset the data for only
asylum_seekers, and convert the data from wide to long form that is more conducive to visualization and analysis.
There are several records with
Unknown as the country of origin. While these are not trivial, for this exploration we will focus on known dyadic flows between countries.
|1960||6||Angola||ANG||AGO||41||Dem. Rep. of the Congo||COD||COD||refugees||150000|
|1961||6||Angola||ANG||AGO||41||Dem. Rep. of the Congo||COD||COD||refugees||150000|
|1961||161||Rwanda||RWA||RWA||41||Dem. Rep. of the Congo||COD||COD||refugees||53000|
|1961||161||Rwanda||RWA||RWA||186||United Rep. of Tanzania||TAN||TZA||refugees||12000|
prepUNref() selects all years and all affected populations, but the user can specify populations and years of interest by using the
range options. For example, specifying
groups = c('refugees') and
range = c(2000,2017) will only process refugees between 2000 and 2017.
prepUNref() provides 2 additional logical switches;
sum. By default,
prepUNref() returns long data frames. This is most convenient for plotting and modeling, however, sometimes it’s interesting to explore data in wide form; especially time series data sets. Moreover, the
sum_groups option will aggregate the totals across all specified groups. Lets use these 2 switches to look at the sum of Syrian refugee and asylum seeking out-flows to Germany between 2014-2017 using
wide = TRUE.
|185||Syrian Arab Rep.||SYR||SYR||72||Germany||GFR||DEU||70585||197186||475649||567507|
Static Grouped Flows
With more than 100,000 unique country-country-year records, outflows, inflows, and varying populations of interest, visualizing the UNHCR can be overwhelming. The
untools packages provides multiple default plotting functions objects produced by the
prepUNref() function. An easy launching point to investigate flows between countries are static barplots of dyadic flows in or out of a target country. Using
plot() on an object produced by
sum_groups = TRUE will produce a barplot for the target country and the top 8 destination or origin countries. The user specifies the country of interest, a year of interest, and whether they want to view inflows (
mode = 'in') or outflows (
mode = 'out'). Let’s start by viewing asylum seeking inflows to the United States in 2013.
Somewhat surprisingly, China tops the list, while Central America rounds out the rest of the top 5. By default,
plot() will list up to 8 countries and will use the maximum year in the prepared dataset if no other year(s) are specified. We can view asylum seeking outflows from the Philippines in 2019 with a simple call.
Stacked Static Flows by Population Type
Up until this point we’ve visualized cumulative migrant flows across all groups or a singular group, but it may be of interest to examine relative proportions of asylum seekers, refugees, and stateless persons. The
untools package provides default plotting functions to visualize stacked bar charts of migrant inflows or outflows by groups. Let’s re-examine inflows of migrants to the USA in 2017, but this time include breakdowns by type. To maintain effected population breakdowns specify
sum_groups = FALSE.
Plotting Time Series
Although static plots of migrant flows are interesting, it’s often more illuminating to examing time series data for migrant inflows and outflows. The
untools package also provides default plotting functions to visualize time series migrant flows for data frames produced with
sum_groups = TRUE. The default plotting function will produce a plot for all years present in the raw data using the 5 countries with the highest totals in the maximum year of the dataset. Let’s view annual cumulative refugee and asylum seeking in-flows to the USA from 2000-2017.
Lastly, similar to the static default plotting functions, we can specify
mode = 'out' to view outflows from a given country.