Subscribe to DSC Newsletter

Tracking migration patterns through Eastern and Southern Europe with Shiny

Contributed by Diego De Lazzari. He is currently in the NYC Data Science Academy 12-week full time Data Science Bootcamp program taking place between July 5th to September 23rd, 2016. This post is based on his second project - R Shiny (due on 4th week of the program). The R code can be found on GitHub  while the App is stored on Shinyapps.io. The original article can be found here.

This project attempts to visualize the migration patterns followed by over a million migrants in the last 18 months, by means of an interactive map developed in "Shiny". While offering a dynamic picture of the migration flow through Eastern Europe, from Greece through the Balkans, up to Austria, the project aims at analyzing its composition, in terms of country of origin and gender.

Introduction


According to the United Nations High Commissioner for Refugees (UNHCR), the total number of refugees globally accounted for in 2016, is estimated to 14.5 million people. When internal displacements and "stateless" individuals are considered, the total population of concern reaches a shocking 58 millions, largely located (~75%) in Africa and Asia. While it is hard to conceive such a huge flux of people fleeing war, poverty or persecution in their country of origin, the consequences of such displacements have recently become a central topic of debate within European Union. Over the last 8 years, more than 1.7 million migrants reached Southern Europe, either through Turkey or crossing the Mediterranean Sea. At the same time, another 2.8 millions registered Syrian refugees are currently located in Turkey.

Browsing the interactive map


The UN refugee agency is continuously collecting the daily arrivals per country,  allowing a precise mapping of the migration flow and therefore supporting the emergency response plan. While the project focuses on the arrivals, gender and origin recorded between October 2015 and June 2016, the complete database can be found here.
As shown in Fig. 1, the Shiny application appears as a dashboard, where the sidebar is used for the navigation while contents are displayed in the main tab.

Fig1 - Overview on the Shiny App

The balkan route represents the daily arrivals, combining the visualization of the daily arrivals on a map (either as single frame or as animation) with a time series for each country. As expected, the flow is rather discontinuous, with a number of "spikes" propagating from Greece to Macedonia (FYROM), Serbia, Croatia and Austria.  The flow of migrants splits between Slovenia and Hungary up to mid 2015, when the latter closes the border forbidding any further access. A similar policy is applied on Albania and Montenegro. Despite such limitations the flow does not seem to be stopped. A comparison between the 6 countries  involved, shows basically the same trend.

 


Fig 2: Interactive map showing the migration through Eastern Europe. The color bar represents arrivals per day where the date is indicated by the slider on the bottom. The time series plotted below the map allows to compare the arrivals in different countries, while averaging over a given number of days (in picture the series are averaged over 7 days).

As mentioned in the previous section, the second and most dangerous route towards Europe crosses central Africa and the Mediterranean Sea. If the balkan route is quite well defined, both geographically and ethnically, the latter is much more complex, as it entails most north African countries, from Morocco to Egypt, and multiple destinations such as Spain, Malta and Italy. In the last 18 months, about 100.000 migrants reached the Italian coast mostly from Algeria, Libya and Egypt. Surprisingly, only 25% of the arrivals are refugees while the majority comes from Nigeria, Eritrea, Gambia, Cote d'Ivoire and several other countries. The difficulties and risks associated with the African route have a clear effect on gender distribution: women and children account only for 26% of the total arrivals in Italy, against the 48% estimated in Greece. Overall, both in 2015 and 2016 the number of registered minors resulted larger than the number of women, for a total of 300.000 arrivals.

Fig 3: Figures at a glance. For a given country (destination), the picture shows the distribution of the migrants by country of origin and gender.

Future steps


Due to the time constraints of the project, the application is mostly focusing on the Balkan route and on "hosting countries". Furthermore, data are dowloaded and processed directly, without exploiting the flexibility offered by the UNHCR API. In the next future the app will be completed by merging the two routes in one single map and updating the underlying statistics in real time. Furthermore, I would like to develop a similar map for the "countries of origin" in order to provide a complete migration pattern, from the country of origin to the actual destination.

Appendix: Developing the Shiny App


In the final section, I will briefly describe the essential steps taken during  the development of the web application.  As anticipated in the title, I used the Shiny Dashboard framework for R

Building the Table


One of the first steps in the development of the app was to build a reactive table depending on two inputs: a chosen dataset and a given number of columns (allowing multiple choices). As the first input influences the available choices for the second input, I used a reactive observer. In contrast with the usual reactive expression (using lazy evaluation), observers execute their content as soon as their dependencies change (i.e. they use eager evaluation).

  observe({
# data frames
if(input$whichData == 'Daily Arrivals'){
data_stat = balkanRoute}
else if(input$whichData == 'Gender'){
data_stat = dataGender2016}
else {data_stat = dataOrigin2016}

# Column indexes
if(input$whichData == 'Daily Arrivals'){
col = col_balkanRoute}
else if(input$whichData == 'Gender'){
col = col_gender}
else {col = col_demography}
updateSelectInput(session, "selected", choices = col,
selected = col[1:min(4, length(col))]
)
# show data using DataTable
output$table <- renderDataTable({
datatable(data_stat[,input$selected], rownames=FALSE,selection = 'multiple') %>%
formatStyle(input$selected, background="skyblue", fontWeight='bold')
})
})

Maps and time series


Once the datasets is available and processed, the main map was created. In order to achieve that I used the package plotly for the world map and dygraph  for the time series. The latter allows to compare arrivals over time or to visualize a single country over a certain number of periods. The sample frequency (i.e. the smoothness of the curves) can be set by the user.

 # Create Map

output$map <- renderPlotly({
sel.data <- filter(balkanRoute.map, Date == input$slider.map)

# light grey boundaries
l <- list(color = toRGB("grey"), width = 0.5)

# specify map projection/options
g <- list(
scope='world',
projection=list(scale = 1),
showframe = T,
showcoastlines = T,
projection = list(type = 'Mercator'),
lataxis = list(range = c(30,50)),
lonaxis = list(range = c(-10,40)),
showsubunits = T,
showcountries = T
)
plot_ly(sel.data, z = Arrivals, text = Country, locations = Code, type = 'choropleth',
color = Arrivals, colors = 'Blues', marker = list(line = l),inherit = FALSE,
colorbar = list(title = 'Arrivals')) %>%
add_trace(.,type="scattergeo",
locations = country_codes$Code, text = country_codes$Country, mode="text") %>%
layout(title ='Daily arrivals across the Balkans', geo = g, width = "100%")

})

# Plot time series

output$arrivals_by_day <- renderDygraph({
dygraph(balkanTimeSeries) %>%
dyHighlight(highlightCircleSize = 5,
highlightSeriesBackgroundAlpha = 0.2,
hideOnMouseOut = FALSE) %>%
dyRangeSelector() %>%
dyRoller(rollPeriod = 7)

})


View Github: Github

Written in R, using R studio. Deployed using ShinyIO.

Packages used:

  • shiny
  • shinydashboard
  • DT
  • xts
  • dplyr
  • tidyr
  • plotly
  • dygraph

Views: 1238

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Haris Hasan on December 29, 2016 at 3:51am

I’ve worked in the field of Data Sciences and I’ve come up with a few smart ways of analyzing and using Big data for the best purposes

At first I made use of ELK Stack or Elastic Stack, which is combination of three technologies, Elasticsearch, Logstash, and Kibana, to extract and display insights from large amounts of data in real-time.

Read more: http://emumba.com/blog/2016-12-23-log-analysis-part-1/

Continuing on the log analysis journey, I then explored Apache Storm. Apache Storm is a framework for real time, distributed, fault tolerant computation. Storm gives you a set of abstractions to help build systems that can analyze a large volume of streaming data in real time.

Storm does all the processing in memory and leaves the persistence layer implementation to user/developer.

Further details: http://emumba.com/blog/2016-12-26-log-analysis-part-2/

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service