The data that I used for this analysis is hosted at Kaggle. The dataset caught my attention because of the availability of geo spatial data. I recently discovered the
sf package, a tidy package in R built for plotting geometry data. I wanted to work through how to build maps with it because I have always felt a bit limited with other map plotting packages. They seem to only have a set number of default maps built-in, whereas with
sf, you can build anything you want.
The dataset itself deals with the German 2017 Election Results and how votes were distributed among regions. I wanted to visualize the results of the German Election at both the State and District level. In Germany there are two separate votes, and you can learn more about the process here. I primarily the second vote because according to Wikipedia:
For the distribution of seats in the German Bundestag, the second vote is more important than the first vote. This second vote allows the elector to vote for a party whose candidates are put together on the regional electoral list.
The voting system is a lot more nuanced than I had ever imagined, but I figured that the second vote would be more telling of how affiliated voters are with certain parties.
# load packages library(tidyverse) library(stringr) library(sf) library(gridExtra) library(hrbrthemes)
In order to map the Turnout Percentage of the election, I needed to find a way to combine both the election results and the shape file of germany. The
sf package made this super easy because it is a tidy package, meaning I could chain together lines of code, as if I was working with a regular dataframe as opposed to spatial data because each observation had a list column with its geometry data.
# Set path to where you stored your data path <- "your/path/" # Read in Geographic Shape data for Germany germany <- read_sf(paste0(path,"Geometrie_Wahlkreise_19DBT_VG250_geo.shp")) %>% st_transform(31467) # Read in Overall Election Data overall <- read_csv(paste0(path,"2017_german_election_overall.csv")) %>% select(-X1) # Aggregate voting data by State voting.pct <- overall %>% group_by(state) %>% summarise( registered = sum(registered.voters), votes.2 = sum(valid_second_votes) ) %>% mutate( voting.pct.2 = round(votes.2/registered, 3)*100 ) # Join voting data to map data germany <- germany %>% left_join(voting.pct, by=c("LAND_NAME"="state")) # Using sf::st_union() show only the borders of states state.map <- germany %>% group_by(LAND_NAME) %>% summarise(geometry = st_union(geometry))
Plot Voter Turnout over time
Before plotting this information on the map of Germany, I wanted to gather some previous election data, so that I could see whether or not voting behavior had change over the past three elections. The additional data was gathered at https://www.bundeswahlleiter.de.
Fig 1. Code
# Map of Voting % among registered voters map.vote <- state.map %>% left_join(voting.pct, by=c("LAND_NAME"="state")) %>% ggplot() + geom_sf(aes(fill=voting.pct.2), colour="white", lwd=.25) + ggtitle("Voter Turnout by State (2017)", "Lower Turnout in Former East Germany") + scale_fill_distiller(type = "div", palette = "Spectral") + guides(fill=guide_legend(title="Voting %")) + hrbrthemes::theme_ipsum_rc() + theme(legend.position = "bottom", plot.title = element_text(hjust=0.5), plot.subtitle = element_text(hjust=0.5)) # Past Turnout Numbers bundesland <- c("Schleswig-Holstein","Mecklenburg-Vorpommern","Hamburg", "Niedersachsen", "Bremen","Brandenburg", "Sachsen-Anhalt", "Berlin", "Nordrhein-Westfalen", "Sachsen", "Hessen", "Thüringen", "Rheinland-Pfalz", "Bayern", "Baden-Württemberg", "Saarland") pct.2013 <- c(73.1, 65.3, 70.3, 73.4, 68.8, 68.4, 62.1, 72.5, 72.5, 69.5, 73.2, 68.2, 72.8, 70.0, 74.3, 72.5) pct.2009 <- c(73.6, 63.0, 71.3, 73.3, 70.3, 67.0, 60.5, 70.9, 71.4, 65.0, 73.8, 65.2, 72.0, 71.6, 72.4, 73.7) past.df <- tibble(bundesland=bundesland, pct.2013=pct.2013, pct.2009=pct.2009) # Join past data to current voting.pct <- voting.pct %>% left_join(past.df, by=c("state"="bundesland")) # Data table showing same information dt.votes <- voting.pct %>% select( Bundesland=state, `2017`=voting.pct.2, `2013`=pct.2013, `2009`=pct.2009 ) %>% arrange(desc(`2017`)) # Histogram of turnout turnout.hist <- dt.votes %>% gather("Year", "Percentage", 2:4) %>% ggplot(aes(x=Percentage, fill=Year)) + geom_histogram(binwidth=1.5, colour="lightgrey") + facet_wrap(~Year, ncol = 1) + ggtitle("Increase Over Time") + scale_fill_manual(values=c("#4DA4CB","#ffe54c", "#ff4c4c")) + xlab("Turnout %") + ylab("") + hrbrthemes::theme_ipsum_rc() + theme(legend.position = "none") # Output plots grid.arrange(map.vote, turnout.hist, ncol=2, widths=c(1,.75))
Results by Party
For the plots dealing with the individual parties, it was important to identify first the top parties and then clean them (change name & assign them a color) so that they were portrayed accurately and looked nice when plotting.
# Parties that we want to analyze top.parties <- c( "Christlich.Demokratische.Union.Deutschlands", "Christlich.Soziale.Union.in.Bayern.e.V.", "Sozialdemokratische.Partei.Deutschlands", "Alternative.für.Deutschland", "BÜNDNIS.90.DIE.GRÜNEN", "Freie.Demokratische.Partei", "DIE.LINKE" ) # Change parties no in top parties to "Other" party <- read_csv(paste0(path,"2017_german_election_party.csv")) %>% select(-X1) %>% mutate( party=ifelse(!party %in% top.parties, "Other", party) ) # Group party data by area_name total.votes <- party %>% group_by(state, area_name) %>% summarise(total_votes = sum(votes_second_vote)) # Group party data by by area and party total.party <- party %>% group_by(state, area_name, party) %>% summarise(party_votes = sum(votes_second_vote)) # Join total.votes & total.party and calculate percentage party.votes <- left_join(total.party, total.votes) %>% mutate(party.pct = round(party_votes/total_votes, 4)*100) # append "Other" to top.parties top.parties <- append(top.parties, "Other") # Shorthand party names party.names <- c("CDU-CSU", "CDU-CSU", "SPD", "AfD", "GRÜNE", "FDP", "LINKE", "Other") # Data frame to join to voting data df.names <- tibble( party=top.parties, # full names Party=party.names # shorthand names ) # Join Shortened Party names to voting data party.votes <- party.votes %>% left_join(df.names) %>% mutate(WKR_NAME = str_replace_all(area_name, "\\s–\\s", "-")) %>% filter(party_votes > 0) # Read in Geographic Shape data for Germany party.map <- read_sf(paste0(path,"Geometrie_Wahlkreise_19DBT_VG250_geo.shp")) %>% st_transform(31467) %>% mutate(WKR_NAME = str_replace_all(WKR_NAME, "\\s\u0096\\s", "-")) %>% left_join(party.votes)
Distribution of Voting by Party
In these segmented maps, you can see that Afd and LINKE were highly concentrated in former former East Germany, while GRÜNE has a large presence in the West.
Fig 2. Code
# Map of voting party.map %>% filter(Party != "Other") %>% ggplot() + geom_sf(aes(fill=party.pct), colour="#6f6f6f") + scale_fill_gradient2( low="white", mid="yellow", high="red", midpoint = 27 ) + facet_wrap(~Party, ncol=3) + ggtitle("Voting Density for each District by Party") + guides(fill=guide_legend(title="Voting %")) + hrbrthemes::theme_ipsum_rc()
One can imagine which points belong to which districts from the previous plot. Afd and Linke have many outliers that are from the Eastern states. On the contrary, FDP has a similar median as Afd, but a very tight distribution among all districts.
Fig. 3 Code
# Colors to associate with parties party.colors <- c( "#FAC40F", "#0475c9", "#c9ad02", "#49d801", "#8c02c9", "#6f6f6f", "#d82c01" ) # Boxplot party.map %>% ggplot(aes(Party, party.pct)) + geom_jitter(colour="lightgrey") + geom_boxplot(aes(colour=Party), alpha=0.8) + ggtitle("Percentage of Votes among Districts by Party", "Each point ist the % of the district that voted for a particular party") + xlab("") + ylab("") + scale_color_manual(values=party.colors) + hrbrthemes::theme_ipsum_rc() + theme(legend.position = "none")
Notice the how AfD and LINKE have some districts where are large number of voters voted for a particular party
Fig 4. Code
# Aggregate State Votes by Party state.party.votes <- party.votes %>% group_by(state, Party) %>% summarise(party.votes = sum(party_votes)) # Aggregate Votes by State state.total.votes <- state.party.votes %>% group_by(state) %>% summarise(total.votes = sum(party.votes)) # Calculate the Pct of Votes by Party state.pct.votes <- state.party.votes %>% left_join(state.total.votes) %>% mutate(party.pct = round(party.votes/total.votes, 3)*100) # Plot bar charts for each State state.pct.votes %>% ggplot(aes(reorder(Party, party.pct), party.pct, fill=Party)) + geom_bar(stat="identity") + ggtitle("Voting Results for each Party by State") + xlab("") + ylab("") + facet_wrap(~state, ncol=4) + coord_flip() + scale_fill_manual(values=party.colors) + hrbrthemes::theme_ipsum_rc() + theme(legend.position = "none")
These bar charts show how voting for individual parties were distributed among staes. In most states, CDU-CSU dominates the vote; however, that is not always the case. Berlin proves to be a very diverse place to live politically (makes sense). Also AfD in Sachsen is on par with CDU-CSU, which must have been a surprise in this past election.
I hope you learned a bit about
sf and the German election system in this post. I am hoping to learn a lot more about geospatial data in the next coming months. I love the flexibility and capability that you get with this package and how it fits so nicely in the tidyverse.
Jonathan Bouchet for the inspiring Kernel on Kaggle.
hrbrthemes: Bob Rudis. (2017). hrbrthemes: A compilation of extra ‘ggplot2’ themes, scales and utilities, including a spell check function plot label fields and an overall emphasis on typography. https://github.com/hrbrmstr/hrbrthemes
sf: Edzer Pebesma. (2017). sf: support for simple features, a standardized way to encode spatial vector data. https://cran.r-project.org/web/packages/sf/index.html
tidyverse: Rstudio and Inc. (2016). tidyverse: a set of packages that work in harmony because they share common data representations and ‘API’ design. [https://github.com/tidyverse/tidyverse] (https://github.com/tidyverse/tidyverse)