Posted on
image: www.thelocal.de

image: www.thelocal.de


Introduction


The data that I used for this analysis is hosted at Kaggle. The dataset caught my attention because of the availability of geo spatial data. I recently discovered the sf package, a tidy package in R built for plotting geometry data. I wanted to work through how to build maps with it because I have always felt a bit limited with other map plotting packages. They seem to only have a set number of default maps built-in, whereas with sf, you can build anything you want.

The dataset itself deals with the German 2017 Election Results and how votes were distributed among regions. I wanted to visualize the results of the German Election at both the State and District level. In Germany there are two separate votes, and you can learn more about the process here. I primarily the second vote because according to Wikipedia:

For the distribution of seats in the German Bundestag, the second vote is more important than the first vote. This second vote allows the elector to vote for a party whose candidates are put together on the regional electoral list.

The voting system is a lot more nuanced than I had ever imagined, but I figured that the second vote would be more telling of how affiliated voters are with certain parties.



Voter Turnout


Required Packages

# load packages
library(tidyverse)
library(stringr)
library(sf)
library(gridExtra)
library(hrbrthemes)

Data Preparation

In order to map the Turnout Percentage of the election, I needed to find a way to combine both the election results and the shape file of germany. The sf package made this super easy because it is a tidy package, meaning I could chain together lines of code, as if I was working with a regular dataframe as opposed to spatial data because each observation had a list column with its geometry data.

# Set path to where you stored your data
path <- "your/path/"

# Read in Geographic Shape data for Germany
germany <- read_sf(paste0(path,"Geometrie_Wahlkreise_19DBT_VG250_geo.shp")) %>% 
  st_transform(31467)

# Read in Overall Election Data
overall <- read_csv(paste0(path,"2017_german_election_overall.csv")) %>% 
  select(-X1)

# Aggregate voting data by State
voting.pct <- overall %>% 
  group_by(state) %>% 
  summarise(
    registered = sum(registered.voters),
    votes.2 = sum(valid_second_votes)
    ) %>% 
  mutate(
    voting.pct.2 = round(votes.2/registered, 3)*100
    )

# Join voting data to map data
germany <- germany %>% 
  left_join(voting.pct, by=c("LAND_NAME"="state"))

# Using sf::st_union() show only the borders of states
state.map <- germany %>% 
  group_by(LAND_NAME) %>% 
  summarise(geometry = st_union(geometry))


Plot Voter Turnout over time

Before plotting this information on the map of Germany, I wanted to gather some previous election data, so that I could see whether or not voting behavior had change over the past three elections. The additional data was gathered at https://www.bundeswahlleiter.de.

Fig 1. Code

# Map of Voting % among registered voters
map.vote <- state.map %>% 
  left_join(voting.pct, by=c("LAND_NAME"="state")) %>% 
  ggplot() +
  geom_sf(aes(fill=voting.pct.2), colour="white", lwd=.25) +
  ggtitle("Voter Turnout by State (2017)", 
          "Lower Turnout in Former East Germany") +
  scale_fill_distiller(type = "div", palette = "Spectral") +
  guides(fill=guide_legend(title="Voting %")) +
  hrbrthemes::theme_ipsum_rc() +
  theme(legend.position = "bottom",
        plot.title = element_text(hjust=0.5),
        plot.subtitle = element_text(hjust=0.5))

# Past Turnout Numbers 
bundesland <- c("Schleswig-Holstein","Mecklenburg-Vorpommern","Hamburg", "Niedersachsen",
                "Bremen","Brandenburg", "Sachsen-Anhalt", "Berlin", "Nordrhein-Westfalen",
                "Sachsen", "Hessen", "Thüringen", "Rheinland-Pfalz", "Bayern", "Baden-Württemberg", "Saarland")
pct.2013 <- c(73.1, 65.3, 70.3, 73.4, 68.8, 68.4, 62.1, 72.5, 72.5, 69.5, 73.2, 68.2, 72.8, 70.0, 74.3, 72.5)
pct.2009 <- c(73.6, 63.0, 71.3, 73.3, 70.3, 67.0, 60.5, 70.9, 71.4, 65.0, 73.8, 65.2, 72.0, 71.6, 72.4, 73.7)
past.df <- tibble(bundesland=bundesland, pct.2013=pct.2013, pct.2009=pct.2009)

# Join past data to current
voting.pct <- voting.pct %>% 
  left_join(past.df, by=c("state"="bundesland"))

# Data table showing same information
dt.votes <- voting.pct %>% 
  select(
    Bundesland=state, `2017`=voting.pct.2,
    `2013`=pct.2013, `2009`=pct.2009
    ) %>%
  arrange(desc(`2017`))

# Histogram of turnout
turnout.hist <- dt.votes %>% 
  gather("Year", "Percentage", 2:4) %>% 
  ggplot(aes(x=Percentage, fill=Year)) +
  geom_histogram(binwidth=1.5, colour="lightgrey") +
  facet_wrap(~Year, ncol = 1) +
  ggtitle("Increase Over Time") +
  scale_fill_manual(values=c("#4DA4CB","#ffe54c", "#ff4c4c")) +
  xlab("Turnout %") + ylab("") +
  hrbrthemes::theme_ipsum_rc() +
  theme(legend.position = "none")

# Output plots
grid.arrange(map.vote, turnout.hist, ncol=2, widths=c(1,.75))




Results by Party


Data Preparation

For the plots dealing with the individual parties, it was important to identify first the top parties and then clean them (change name & assign them a color) so that they were portrayed accurately and looked nice when plotting.

# Parties that we want to analyze
top.parties <- c(
  "Christlich.Demokratische.Union.Deutschlands", 
  "Christlich.Soziale.Union.in.Bayern.e.V.",
  "Sozialdemokratische.Partei.Deutschlands",
  "Alternative.für.Deutschland", "BÜNDNIS.90.DIE.GRÜNEN", 
  "Freie.Demokratische.Partei", "DIE.LINKE"
  )

# Change parties no in top parties to "Other"
party <- read_csv(paste0(path,"2017_german_election_party.csv")) %>% 
  select(-X1) %>% 
  mutate(
    party=ifelse(!party %in% top.parties, "Other", party)
    )

# Group party data by area_name
total.votes <- party %>%
  group_by(state, area_name) %>% 
  summarise(total_votes = sum(votes_second_vote))

# Group party data by by area and party
total.party <- party %>% 
  group_by(state, area_name, party) %>% 
  summarise(party_votes = sum(votes_second_vote))

# Join total.votes & total.party and calculate percentage
party.votes <- left_join(total.party, total.votes) %>% 
  mutate(party.pct = round(party_votes/total_votes, 4)*100)

# append "Other" to top.parties
top.parties <- append(top.parties, "Other")

# Shorthand party names
party.names <- c("CDU-CSU", "CDU-CSU", "SPD", "AfD", "GRÜNE", "FDP", "LINKE", "Other")

# Data frame to join to voting data
df.names <- tibble(
  party=top.parties,   # full names
  Party=party.names    # shorthand names
  )

# Join Shortened Party names to voting data
party.votes <- party.votes %>% 
  left_join(df.names) %>% 
  mutate(WKR_NAME = str_replace_all(area_name, "\\s–\\s", "-")) %>% 
  filter(party_votes > 0)

# Read in Geographic Shape data for Germany
party.map <- read_sf(paste0(path,"Geometrie_Wahlkreise_19DBT_VG250_geo.shp")) %>% 
  st_transform(31467) %>% 
  mutate(WKR_NAME = str_replace_all(WKR_NAME, "\\s\u0096\\s", "-")) %>% 
  left_join(party.votes)

Distribution of Voting by Party

In these segmented maps, you can see that Afd and LINKE were highly concentrated in former former East Germany, while GRÜNE has a large presence in the West.

Fig 2. Code

# Map of voting 
party.map %>%
    filter(Party != "Other") %>%
    ggplot() +
    geom_sf(aes(fill=party.pct), colour="#6f6f6f") +
    scale_fill_gradient2(
      low="white", mid="yellow", high="red", midpoint = 27
      ) +
    facet_wrap(~Party, ncol=3) +
    ggtitle("Voting Density for each District by Party") +
    guides(fill=guide_legend(title="Voting %")) +
    hrbrthemes::theme_ipsum_rc()

One can imagine which points belong to which districts from the previous plot. Afd and Linke have many outliers that are from the Eastern states. On the contrary, FDP has a similar median as Afd, but a very tight distribution among all districts.

Fig. 3 Code

# Colors to associate with parties
party.colors <- c(
  "#FAC40F", "#0475c9", "#c9ad02", "#49d801", "#8c02c9", "#6f6f6f", "#d82c01"
  )

# Boxplot
party.map %>% 
  ggplot(aes(Party, party.pct)) +
  geom_jitter(colour="lightgrey") +
  geom_boxplot(aes(colour=Party), alpha=0.8) +
  ggtitle("Percentage of Votes among Districts by Party",
  "Each point ist the % of the district that voted for a particular party") +
  xlab("") + ylab("") +
  scale_color_manual(values=party.colors) +
  hrbrthemes::theme_ipsum_rc() +
  theme(legend.position = "none")

Notice the how AfD and LINKE have some districts where are large number of voters voted for a particular party

Fig 4. Code

# Aggregate State Votes by Party
state.party.votes <- party.votes %>% 
  group_by(state, Party) %>% 
  summarise(party.votes = sum(party_votes))

# Aggregate Votes by State
state.total.votes <- state.party.votes %>% 
  group_by(state) %>% 
  summarise(total.votes = sum(party.votes))

# Calculate the Pct of Votes by Party
state.pct.votes <- state.party.votes %>% 
  left_join(state.total.votes) %>% 
  mutate(party.pct = round(party.votes/total.votes, 3)*100)

# Plot bar charts for each State
state.pct.votes %>% 
  ggplot(aes(reorder(Party, party.pct), party.pct, fill=Party)) +
  geom_bar(stat="identity") +
  ggtitle("Voting Results for each Party by State") + 
  xlab("") + ylab("") +
  facet_wrap(~state, ncol=4) +
  coord_flip() +
  scale_fill_manual(values=party.colors) +
  hrbrthemes::theme_ipsum_rc() +
  theme(legend.position = "none")

These bar charts show how voting for individual parties were distributed among staes. In most states, CDU-CSU dominates the vote; however, that is not always the case. Berlin proves to be a very diverse place to live politically (makes sense). Also AfD in Sachsen is on par with CDU-CSU, which must have been a surprise in this past election.



Conclusion


I hope you learned a bit about sf and the German election system in this post. I am hoping to learn a lot more about geospatial data in the next coming months. I love the flexibility and capability that you get with this package and how it fits so nicely in the tidyverse.



Acknowledgements


comments powered by Disqus