The year 2016 marked a new age for America, when the presidency of President Barack Obama ended and Donald Trump was elected into office. Just four years later, this dramatic change has occured again, as Joe Biden was just recently elected as the new President of the United States in November 2020. From President Obama to President Trump to soon-to-be president Biden, the political affiliation of the country's leader has changed from Democrat to Republican, and back to Democrat. More interesting than this change in political affiliation was the voter turnout in 2020, as it was the highest ever recorded in the United States of America's history. In addition, many authoritative figures claimed this presidential election to be the most important election that the country has ever had, which could be a reason why there was a such a large increase in voter turnout from 2016 to 2020.
Because Joe Biden- a Democrat- won the election, this means that states in the country needed to flip in order for him to win. Of the many states that flipped, there are a few large states that played an imporant role in Biden's win as President elect. The states that my partner Adit and I chose to research were Arizona, Georgia, Michigan, Pennsylvania, and Wisconsin. When looking at these states, we first wanted to see which counties in these states had the highest percentage in relation to the state's total votes. These highly populated counties contribute the most to the election of either candidate. Next, we checked to see how each county increased in voter turnout between the two elections. Lastly, we made maps of each state which color coded increases in turnout based on county.
To begin our analysis and comparison of the 2016 and 2020 election data, we will first take a look at the data for voter turnout and demographics from the elections. To do this, we are going to scrape our election data from the link provided below. I pulled the 2016 and 2020 data and created a dataframe out of it. We will be looking at the voter data to see which counties in our selected states had the largest percentage in comparison to the total state's vote count.
The election results at the county-level are scraped from results published by Townhall.com. Their formatted tables for the elections makes it easy for pandas to create a dataframe to capture results. This data was converted into a csv and added to a github repository by the user 'tonmcg', and this is where I scraped the data.
For more information on the 2016 dataset, visit Github 2016 Election Data
For more information on the 2020 dataset, visit Github_2020_Election_data
from bs4 import BeautifulSoup
import matplotlib.pyplot as plt
import matplotlib
import pandas as pd
import numpy as np
import seaborn
df_2016 = pd.read_csv("election2016/2016_US_County_Level_Presidential_Results.csv")
df_2016 = df_2016.drop(['Unnamed: 0'], axis=1) # Getting rid of extra index column
df_2016['percent_votes'] = 0
df_2016
Here is the 2016 data per county in the US.
df_2020 = pd.read_csv ('election2020/2020_US_County_Level_Presidential_Results.csv')
df_2020['percent_votes'] = 0
df_2020
Here is the election data for 2020.
Now we will look at each of the flipped states from 2016 to 2020
AZ_data_2016 = df_2016[df_2016['state_abbr'] == "AZ"]
AZ_data_2016
AZ_data_2020 = df_2020[df_2020['state_name'] == 'Arizona']
AZ_data_2020
GA_data_2016 = df_2016[df_2016['state_abbr'] == "GA"]
GA_data_2016
GA_data_2020 = df_2020[df_2020['state_name'] == 'Georgia']
GA_data_2020
MI_data_2016 = df_2016[df_2016['state_abbr'] == "MI"]
MI_data_2016
MI_data_2020 = df_2020[df_2020['state_name'] == 'Michigan']
MI_data_2020
PA_data_2016 = df_2016[df_2016['state_abbr'] == "PA"]
PA_data_2016
PA_data_2020 = df_2020[df_2020['state_name'] == "Pennsylvania"]
PA_data_2020
WI_data_2016 = df_2016[df_2016['state_abbr'] == 'WI']
WI_data_2016
WI_data_2020 = df_2020[df_2020['state_name'] == 'Wisconsin']
WI_data_2020
Now that we have created datasets based on the five different states of interest, it is now time to add a 'percent_votes' data column to the tables. By doing this, we will be able to clearly see what percent of the total vote count that the specific county had in relation to the state. To do this, we first need to find the sum of the total votes across the state, and then we simply divide the county's vote count by the total vote count in order to find the percentage.
After inserting this information into each state's specific dataframe, we can visualize the data by creating pie charts to show the distribution of each county's votes in relation to their respective state.
total_AZ_vote_2016 = int(AZ_data_2016['total_votes'].sum())
print(total_AZ_vote_2016)
total_AZ_vote_2020 = AZ_data_2020['total_votes'].sum()
print(total_AZ_vote_2020)
The state of Arizona had a total of 2062810 votes during the 2016 Election.
The state of Arizona had a total of 3387326 votes during the 2020 Election.
After finding this piece of data, we can now update each 'percent_votes' value associated to each county, calculating the proper percentage of votes that the county encompasses in relation to the state.
After inserting the data properly into the table, we can then use matplotlib in order to create a pie chart for county's percentage of votes against total votes. Underneath the pie chart, I also included a legend, naming every single county and the associated percentage of votes. This percentage was rounded to 2 decimal points for easier viewing.
for index, row in AZ_data_2016.iterrows():
AZ_data_2016.loc[:, 'percent_votes'] = AZ_data_2016.loc[:,'total_votes']/total_AZ_vote_2016
for index, row in AZ_data_2020.iterrows():
AZ_data_2020.loc[:,'percent_votes'] = AZ_data_2020.loc[:,'total_votes']/total_AZ_vote_2020
# Visualizing the Percent Voter Column - 2016
plt.figure(figsize=(10,10))
plt.pie(AZ_data_2016['percent_votes'], labels = AZ_data_2016['county_name'], autopct='%1.2f%%')
plt.title('Arizona 2016: Percent of Votes per County in Relation to the State')
plt.axis('equal')
plt.show()
AZ_data_2016 = AZ_data_2016.sort_values(by=['percent_votes'], ascending=False)
# Visualizing the Percent Voter Column - 2020
plt.figure(figsize=(10,10))
plt.pie(AZ_data_2020['percent_votes'], labels = AZ_data_2020['county_name'], autopct='%1.2f%%')
plt.title('Arizona 2020: Percent of Votes per County in Relation to the State')
plt.axis('equal')
plt.show()
AZ_data_2020 = AZ_data_2020.sort_values(by=['percent_votes'], ascending=False)
print('The 2016 county rankings are:')
for index, row in AZ_data_2016.iterrows():
print(row['county_name'] + ": " + str(round(row['percent_votes']*100, 2)) + "%")
print('\n\n')
print('The 2020 county rankings are:')
for index, row in AZ_data_2020.iterrows():
print(row['county_name'] + ": " + str(round(row['percent_votes']*100, 2)) + "%")
The most interesting piece of information between the two elections is that Arizona in total went from having 2062810 votes in 2016 to 3387326 in 2020. This is almost a whopping 70% increase in votes. Surprisingly, the top vote collector between 2016 and 2020, Maricopa county, the county with Arizona's capital and largest city, Pheonix, increased in voter population percent in regards to the state.
# Total state vote, print, explanation
total_GA_vote_2016 = int(GA_data_2016['total_votes'].sum())
print(total_GA_vote_2016)
total_GA_vote_2020 = GA_data_2020['total_votes'].sum()
print(total_GA_vote_2020)
The state of Georgia had a total of 4029564 votes during the 2016 Election.
The state of Georgia had a total of 4997716 votes during the 2020 Election.
# Create row (i.e. insert data)
for index, row in GA_data_2016.iterrows():
GA_data_2016.loc[:, 'percent_votes'] = GA_data_2016.loc[:, 'total_votes']/total_GA_vote_2016
for index, row in GA_data_2020.iterrows():
GA_data_2020.loc[:,'percent_votes'] = GA_data_2020.loc[:,'total_votes']/total_GA_vote_2020
# Visualizing the Percent Voter Column - 2016
plt.figure(figsize=(10,10))
plt.pie(GA_data_2016['percent_votes'], labels=GA_data_2016['county_name'], autopct='%1.2f%%')
plt.title('Georgia 2016: Percent of Votes per County in Relation to the State')
plt.axis('equal')
plt.show()
GA_data_2016 = GA_data_2016.sort_values(by=['percent_votes'], ascending=False)
# Visualizing the Percent Voter Column - 2020
plt.figure(figsize=(10,10))
plt.pie(GA_data_2020['percent_votes'], labels = GA_data_2020['county_name'], autopct='%1.2f%%')
plt.title('Georgia 2020: Percent of Votes per County in Relation to the State')
plt.axis('equal')
plt.show()
GA_data_2020 = GA_data_2020.sort_values(by=['percent_votes'], ascending=False)
print('The 2016 county rankings are:')
for index, row in GA_data_2016.iterrows():
print(row['county_name'] + ": " + str(round(row['percent_votes']*100, 2)) + "%")
print('\n\n')
print('The 2020 county rankings are:')
for index, row in GA_data_2020.iterrows():
print(row['county_name'] + ": " + str(round(row['percent_votes']*100, 2)) + "%")
The counties that have the highest percentages are the counties with heavily populated cities. These are nearest to Atlanta. Georgia also experienced an increase in votes, gaining one million votes between the elections.
# Total state vote, print, explanation
total_MI_vote_2016 = int(MI_data_2016['total_votes'].sum())
print(total_MI_vote_2016)
# Total state vote, print, explanation
total_MI_vote_2020 = MI_data_2020['total_votes'].sum()
print(total_MI_vote_2020)
The state of Michigan had a total of 4790917 votes during the 2016 election.
The state of Michigan had a total of 5539302 votes during the 2020 election.
# Create row (i.e. insert data)
for index, row in MI_data_2016.iterrows():
MI_data_2016.loc[:, 'percent_votes'] = MI_data_2016.loc[:,'total_votes']/total_MI_vote_2016
for index, row in MI_data_2020.iterrows():
MI_data_2020.loc[:, 'percent_votes'] = MI_data_2020.loc[:,'total_votes']/total_MI_vote_2020
# Visualizing the Percent Voter Column - 2016
plt.figure(figsize=(10,10))
plt.pie(MI_data_2016['percent_votes'], labels=MI_data_2016['county_name'], autopct='%1.2f%%')
plt.title('Michigan 2016: Percent of Votes per County in Relation to the State')
plt.axis('equal')
plt.show()
MI_data_2016 = MI_data_2016.sort_values(by=['percent_votes'], ascending=False)
# Visualizing the Percent Voter Column - 2020
plt.figure(figsize=(10,10))
plt.pie(MI_data_2020['percent_votes'], labels = MI_data_2020['county_name'], autopct='%1.2f%%')
plt.title('Michigan 2020: Percent of Votes per County in Relation to the State')
plt.axis('equal')
plt.show()
MI_data_2020 = MI_data_2020.sort_values(by=['percent_votes'], ascending=False)
print('The 2016 county rankings are:')
for index, row in MI_data_2016.iterrows():
print(row['county_name'] + ": " + str(round(row['percent_votes']*100, 2)) + "%")
print('\n\n')
print('The 2020 county rankings are:')
for index, row in MI_data_2020.iterrows():
print(row['county_name'] + ": " + str(round(row['percent_votes']*100, 2)) + "%")
Michigan also experienced an increase in votes of around 800k between the two elections. The majoritiy of the votes come from the Detroit area.
# Total state vote, print, explanation
total_PA_vote_2016 = int(PA_data_2016['total_votes'].sum())
print(total_PA_vote_2016)
total_PA_vote_2020 = int(PA_data_2020['total_votes'].sum())
print(total_PA_vote_2020)
The state of Pennsylvania had a total of 5970107 votes during the 2016 election.
The state of Pennsylvania had a total of 6925255 votes during the 2020 election.
# Create row (i.e. insert data)
for index, row in PA_data_2016.iterrows():
PA_data_2016.loc[:, 'percent_votes'] = PA_data_2016.loc[:,'total_votes']/total_PA_vote_2016
for index, row in PA_data_2020.iterrows():
PA_data_2020.loc[:, 'percent_votes'] = PA_data_2020.loc[:,'total_votes']/total_PA_vote_2020
# Creating plot and printing percent
plt.figure(figsize=(10,10))
plt.pie(PA_data_2016['percent_votes'], labels=PA_data_2016['county_name'], autopct='%1.2f%%')
plt.title('Pennsylvania 2016: Percent of Votes per County in Relation to the State')
plt.axis('equal')
plt.show()
PA_data_2016 = PA_data_2016.sort_values(by=['percent_votes'], ascending=False)
# Visualizing the Percent Voter Column - 2020
plt.figure(figsize=(10,10))
plt.pie(PA_data_2020['percent_votes'], labels = PA_data_2020['county_name'], autopct='%1.2f%%')
plt.title('Pennsylvania 2020: Percent of Votes per County in Relation to the State')
plt.axis('equal')
plt.show()
PA_data_2020 = PA_data_2020.sort_values(by=['percent_votes'], ascending=False)
print('The 2016 county rankings are:')
for index, row in PA_data_2016.iterrows():
print(row['county_name'] + ": " + str(round(row['percent_votes']*100, 2)) + "%")
print('\n\n')
print('The 2020 county rankings are:')
for index, row in PA_data_2020.iterrows():
print(row['county_name'] + ": " + str(round(row['percent_votes']*100, 2)) + "%")
Pennsylvania had an increase of about 1 million votes between the elections and its biggest cities, Philadelphia and Pittsburgh had the largest influence in the election.
# Total state vote, print, explanation
total_WI_vote_2016 = int(WI_data_2016['total_votes'].sum())
print(total_WI_vote_2016)
total_WI_vote_2020 = WI_data_2020['total_votes'].sum()
print(total_WI_vote_2020)
The state of Wisconsin had a total of 2937326 votes during the 2016 Election.
The state of Wisconsin had a total of 3297352 votes during the 2020 Election.
# Create row (i.e. insert data)
for index, row in WI_data_2016.iterrows():
WI_data_2016.loc[:, 'percent_votes'] = WI_data_2016.loc[:,'total_votes']/total_WI_vote_2016
for index, row in WI_data_2020.iterrows():
WI_data_2020.loc[:, 'percent_votes'] = WI_data_2020.loc[:,'total_votes']/total_WI_vote_2020
# Creating plot and printing percent
plt.figure(figsize=(10,10))
plt.pie(WI_data_2016['percent_votes'], labels=WI_data_2016['county_name'], autopct='%1.2f%%')
plt.title('Wisconsin 2016: Percent of Votes per County in Relation to the State')
plt.axis('equal')
plt.show()
WI_data_2016 = WI_data_2016.sort_values(by=['percent_votes'], ascending=False)
# Visualizing the Percent Voter Column - 2020
plt.figure(figsize=(10,10))
plt.pie(WI_data_2020['percent_votes'], labels = WI_data_2020['county_name'], autopct='%1.2f%%')
plt.title('Wisconsin 2020: Percent of Votes per County in Relation to the State')
plt.axis('equal')
plt.show()
WI_data_2020 = WI_data_2020.sort_values(by=['percent_votes'], ascending=False)
print('The 2016 county rankings are:')
for index, row in WI_data_2016.iterrows():
print(row['county_name'] + ": " + str(round(row['percent_votes']*100, 2)) + "%")
print('\n\n')
print('The 2020 county rankings are:')
for index, row in WI_data_2020.iterrows():
print(row['county_name'] + ": " + str(round(row['percent_votes']*100, 2)) + "%")
Wisconsin had a smaller increase in total votes between the years. Its cities of Madison and Milwaukee are the biggest population centers and hold a great deal of influence.
After finding the percentage of the votes in relation to the state for each county and visualizing the pie chart, we are now interested in looking at the distribution of our data in relation to the total population of the state. We will be pulling Census data for the five states we are interested in. The data is a collection of information from the years 2010-2019, but we will drop the other years that we do not need, because we are specifically paying attention to the 2016 election and 2020 elections. Because we do not have data for 2020, we will use the 2019 number.
Now, we will scrape the csv's and create dataframes for each state for the total population per county.
arizonaPop = pd.read_csv('changeInPop/arizona.csv')
arizonaPop.rename(columns = {'Unnamed: 0': 'County/State'}, inplace=True)
arizonaPop = arizonaPop.drop(['2010', '2011', '2012', '2013', '2014', '2015', '2017', '2018', 'Census', 'Estimates Base'], axis=1)
arizonaPop
georgiaPop = pd.read_csv('changeInPop/georgia.csv')
georgiaPop.rename(columns={'Unnamed: 0': 'County/State'}, inplace=True)
georgiaPop = georgiaPop.drop(['2010', '2011', '2012', '2013', '2014', '2015', '2017', '2018', 'Census', 'Estimates Base'], axis=1)
georgiaPop
michiganPop = pd.read_csv('changeInPop/michigan.csv')
michiganPop.rename(columns={'Unnamed: 0': 'County/State'}, inplace=True)
michiganPop = michiganPop.drop(['2010', '2011', '2012', '2013', '2014', '2015', '2017', '2018', 'Census', 'Estimates Base'], axis=1)
michiganPop
pennsylvaniaPop = pd.read_csv('changeInPop/pennsylvania.csv')
pennsylvaniaPop.rename(columns={'Unnamed: 0': 'County/State'}, inplace=True)
pennsylvaniaPop = pennsylvaniaPop.drop(['2010', '2011', '2012', '2013', '2014', '2015', '2017', '2018', 'Census', 'Estimates Base'], axis=1)
pennsylvaniaPop
wisconsinPop = pd.read_csv('changeInPop/wisconsin.csv')
wisconsinPop.rename(columns={'Unnamed: 0': 'County/State'}, inplace=True)
wisconsinPop = wisconsinPop.drop(['2010', '2011', '2012', '2013', '2014', '2015', '2017', '2018', 'Census', 'Estimates Base'], axis=1)
wisconsinPop
We would now like to visualize the voter turnout in 2016 and 2020. To do this, we will use the Census data that was extracted above, and see the percentage of people living in each county that voted. We will take the 'total_votes' from the 2016 and 2020 election data table, and divide that by the total population of that county in order to receive our final value.
We first need to use Regex and other techniques in order to merge the two tables together in order to receive the information that we need overall.
One thing to note is that when we were merging the tables together and creating a 'Voter Turnout' column, we needed to convert the '2016 County Population' values to floats due to the fact that they had commas. After this, we can divide the total votes of each county by the county's total population in order to receive the voter turnout.
NOTE: Our voter turnout percentages include children, people who are not allowed to vote, and non-citizens in the count. For a true voter turnout, a voting age population should be used. Instead, this is more of a percentage of people who vote rather than a voter turnout.
# Editing the 'arizonaPop' table so that the county's are standalone
import re
arizonaPop['county_name'] = arizonaPop['County/State'].str.extract(r'([a-zA-Z]*\s*[a-zA-Z]*\s*County)')
arizonaPop = arizonaPop.drop(columns=['County/State'], axis=1)
After using regex in order to extract just the county names for each entry, with the first entry in the table being the state totals itself (that is why it comes up as NaN), we then rearrange the rows in order to make the dataframe look organized, with the county names at the front of the table.
After rearranging the data, we can now go about merging our two tables together in order to create a visualization of the voter turnout.
# Moving the Columns around in order for the County Names to be shown first
cols = arizonaPop.columns.tolist()
cols = cols[-1:] + cols[:-1]
arizonaPop = arizonaPop[cols]
# Creating merged table and dropping appropriate columns to make data look good
merged_arizona_data = pd.merge(left=AZ_data_2016, right=arizonaPop, on='county_name')
merged_arizona_data.rename(columns={'2019': '2020 County Population', '2016': '2016 County Population', 'state_abbr':
'State Abbrev.', 'total_votes': '2016 Vote Count'}, inplace=True)
no_need_2016 = ['votes_dem', 'votes_gop', 'per_dem', 'per_gop',
'diff', 'per_point_diff', 'combined_fips', 'percent_votes']
no_need_2020 = ['votes_dem', 'votes_gop', 'per_dem', 'per_gop', 'diff', 'per_point_diff', 'State Abbrev.', 'percent_votes']
merged_arizona_data = merged_arizona_data.drop(columns=no_need_2016, axis=1)
merged_arizona_data = pd.merge(left=AZ_data_2020, right=merged_arizona_data, on='county_name')
merged_arizona_data = merged_arizona_data.drop(columns=no_need_2020, axis=1)
merged_arizona_data.rename(columns={'total_votes': '2020 Vote Count'}, inplace=True)
merged_arizona_data.rename(columns={'county_name': 'County Name'}, inplace=True)
merged_arizona_data = merged_arizona_data[['state_name', 'County Name', 'county_fips', '2016 Vote Count',
'2020 Vote Count', '2016 County Population', '2020 County Population']]
# converting 2016 County Population values from string to float
convert = merged_arizona_data['2016 County Population'].tolist()
cnt = 0
for i in convert:
convert[cnt] = float("".join(i.split(",")))
cnt += 1
merged_arizona_data = merged_arizona_data.drop(columns='2016 County Population', axis=1)
merged_arizona_data['2016 County Population'] = convert
convert = merged_arizona_data['2020 County Population'].tolist()
cnt = 0
for i in convert:
convert[cnt] = float("".join(i.split(",")))
cnt += 1
merged_arizona_data = merged_arizona_data.drop(columns='2020 County Population', axis=1)
merged_arizona_data['2020 County Population'] = convert
# Calculation of voter turnout percentage
voter_turnout_2016 = []
for i, r in merged_arizona_data.iterrows():
voter_turnout_2016.append(r['2016 Vote Count']/r['2016 County Population'])
merged_arizona_data['2016 Voter Turnout'] = voter_turnout_2016
voter_turnout_2020 = []
for i, r in merged_arizona_data.iterrows():
voter_turnout_2020.append(r['2020 Vote Count']/r['2020 County Population'])
merged_arizona_data['2020 Voter Turnout'] = voter_turnout_2020
merged_arizona_data
# Creating a visualization for the voter turnout data
merged_arizona_data.plot(x="County Name", y=["2016 Voter Turnout", "2020 Voter Turnout"], kind="bar",
title='Arizona Voter Turnout per County', ylabel='Voter Turnout (decimal form)')
print('County Name:\t2016\t\t2020')
for i, r in merged_arizona_data.iterrows():
print(r['County Name'] + ": " + str(round(r['2016 Voter Turnout']*100, 2)) + "%" + '\t\t'+ str(round(r['2020 Voter Turnout']*100, 2)) + "%")
georgiaPop['county_name'] = georgiaPop['County/State'].str.extract(r'([a-zA-Z]*\s*[a-zA-Z]*\s*County)')
georgiaPop = georgiaPop.drop(columns=['County/State'], axis=1)
cols = georgiaPop.columns.tolist()
cols = cols[-1:] + cols[:-1]
georgiaPop = georgiaPop[cols]
# Creating merged table and dropping appropriate columns to make data look good
merged_georgia_data = pd.merge(left=GA_data_2016, right=georgiaPop, on='county_name')
merged_georgia_data.rename(columns={'2019': '2020 County Population', '2016': '2016 County Population', 'state_abbr':
'State Abbrev.', 'total_votes': '2016 Vote Count'}, inplace=True)
no_need_2016 = ['votes_dem', 'votes_gop', 'per_dem', 'per_gop',
'diff', 'per_point_diff', 'combined_fips', 'percent_votes']
no_need_2020 = ['votes_dem', 'votes_gop', 'per_dem', 'per_gop', 'diff', 'per_point_diff', 'State Abbrev.', 'percent_votes']
merged_georgia_data = merged_georgia_data.drop(columns=no_need_2016, axis=1)
merged_georgia_data = pd.merge(left=GA_data_2020, right=merged_georgia_data, on='county_name')
merged_georgia_data = merged_georgia_data.drop(columns=no_need_2020, axis=1)
merged_georgia_data.rename(columns={'total_votes': '2020 Vote Count'}, inplace=True)
merged_georgia_data.rename(columns={'county_name': 'County Name'}, inplace=True)
merged_georgia_data = merged_georgia_data[['state_name', 'County Name', 'county_fips', '2016 Vote Count',
'2020 Vote Count', '2016 County Population', '2020 County Population']]
# converting 2016 County Population values from string to float
convert = merged_georgia_data['2016 County Population'].tolist()
cnt = 0
for i in convert:
convert[cnt] = float("".join(i.split(",")))
cnt += 1
merged_georgia_data = merged_georgia_data.drop(columns='2016 County Population', axis=1)
merged_georgia_data['2016 County Population'] = convert
convert = merged_georgia_data['2020 County Population'].tolist()
cnt = 0
for i in convert:
convert[cnt] = float("".join(i.split(",")))
cnt += 1
merged_georgia_data = merged_georgia_data.drop(columns='2020 County Population', axis=1)
merged_georgia_data['2020 County Population'] = convert
# Calculation of voter turnout percentage
voter_turnout_2016 = []
for i, r in merged_georgia_data.iterrows():
voter_turnout_2016.append(r['2016 Vote Count']/r['2016 County Population'])
merged_georgia_data['2016 Voter Turnout'] = voter_turnout_2016
voter_turnout_2020 = []
for i, r in merged_georgia_data.iterrows():
voter_turnout_2020.append(r['2020 Vote Count']/r['2020 County Population'])
merged_georgia_data['2020 Voter Turnout'] = voter_turnout_2020
merged_georgia_data
# Creating a visualization for the voter turnout data
merged_georgia_data.plot(x="County Name", y=["2016 Voter Turnout", "2020 Voter Turnout"], kind="bar",
title='Georgia Voter Turnout per County', ylabel='Voter Turnout (decimal form)',
figsize=(25,25))
print('County Name:\t2016\t\t2020')
for i, r in merged_georgia_data.iterrows():
print(r['County Name'] + ": " + str(round(r['2016 Voter Turnout']*100, 2)) + "%" + '\t\t'+ str(round(r['2020 Voter Turnout']*100, 2)) + "%")
michiganPop['county_name'] = michiganPop['County/State'].str.extract(r'([a-zA-Z]*\s*[a-zA-Z]*\s*County)')
michiganPop = michiganPop.drop(columns=['County/State'], axis=1)
cols = michiganPop.columns.tolist()
cols = cols[-1:] + cols[:-1]
michiganPop = michiganPop[cols]
# Creating merged table and dropping appropriate columns to make data look good
merged_michigan_data = pd.merge(left=MI_data_2016, right=michiganPop, on='county_name')
merged_michigan_data.rename(columns={'2019': '2020 County Population', '2016': '2016 County Population', 'state_abbr':
'State Abbrev.', 'total_votes': '2016 Vote Count'}, inplace=True)
no_need_2016 = ['votes_dem', 'votes_gop', 'per_dem', 'per_gop', 'diff', 'per_point_diff', 'combined_fips', 'percent_votes']
no_need_2020 = ['votes_dem', 'votes_gop', 'per_dem', 'per_gop', 'diff', 'per_point_diff', 'State Abbrev.', 'percent_votes']
merged_michigan_data = merged_michigan_data.drop(columns=no_need_2016, axis=1)
merged_michigan_data = pd.merge(left=MI_data_2020, right=merged_michigan_data, on='county_name')
merged_michigan_data = merged_michigan_data.drop(columns=no_need_2020, axis=1)
merged_michigan_data.rename(columns={'total_votes': '2020 Vote Count'}, inplace=True)
merged_michigan_data.rename(columns={'county_name': 'County Name'}, inplace=True)
merged_michigan_data = merged_michigan_data[['state_name', 'County Name', 'county_fips', '2016 Vote Count',
'2020 Vote Count', '2016 County Population', '2020 County Population']]
# converting 2016 County Population values from string to float
convert = merged_michigan_data['2016 County Population'].tolist()
cnt = 0
for i in convert:
convert[cnt] = float("".join(i.split(",")))
cnt += 1
merged_michigan_data = merged_michigan_data.drop(columns='2016 County Population', axis=1)
merged_michigan_data['2016 County Population'] = convert
convert = merged_michigan_data['2020 County Population'].tolist()
cnt = 0
for i in convert:
convert[cnt] = float("".join(i.split(",")))
cnt += 1
merged_michigan_data = merged_michigan_data.drop(columns='2020 County Population', axis=1)
merged_michigan_data['2020 County Population'] = convert
# Calculation of voter turnout percentage
voter_turnout_2016 = []
for i, r in merged_michigan_data.iterrows():
voter_turnout_2016.append(r['2016 Vote Count']/r['2016 County Population'])
merged_michigan_data['2016 Voter Turnout'] = voter_turnout_2016
voter_turnout_2020 = []
for i, r in merged_michigan_data.iterrows():
voter_turnout_2020.append(r['2020 Vote Count']/r['2020 County Population'])
merged_michigan_data['2020 Voter Turnout'] = voter_turnout_2020
merged_michigan_data
# Creating a visualization for the voter turnout data
merged_michigan_data.plot(x="County Name", y=["2016 Voter Turnout", "2020 Voter Turnout"], kind="bar",
title='Michigan Voter Turnout per County', ylabel='Voter Turnout (decimal form)',
figsize=(25,25))
print('County Name:\t2016\t\t2020')
for i, r in merged_michigan_data.iterrows():
print(r['County Name'] + ": " + str(round(r['2016 Voter Turnout']*100, 2)) + "%" + '\t\t'+ str(round(r['2020 Voter Turnout']*100, 2)) + "%")
pennsylvaniaPop['county_name'] = pennsylvaniaPop['County/State'].str.extract(r'([a-zA-Z]*\s*[a-zA-Z]*\s*County)')
pennsylvaniaPop = pennsylvaniaPop.drop(columns=['County/State'], axis=1)
cols = pennsylvaniaPop.columns.tolist()
cols = cols[-1:] + cols[:-1]
pennsylvaniaPop = pennsylvaniaPop[cols]
# Creating merged table and dropping appropriate columns to make data look good
merged_pennsylvania_data = pd.merge(left=PA_data_2016, right=pennsylvaniaPop, on='county_name')
merged_pennsylvania_data.rename(columns={'2019': '2020 County Population', '2016': '2016 County Population', 'state_abbr':
'State Abbrev.', 'total_votes': '2016 Vote Count'}, inplace=True)
no_need_2016 = ['votes_dem', 'votes_gop', 'per_dem', 'per_gop',
'diff', 'per_point_diff', 'combined_fips', 'percent_votes']
no_need_2020 = ['votes_dem', 'votes_gop', 'per_dem', 'per_gop', 'diff', 'per_point_diff', 'State Abbrev.', 'percent_votes']
merged_pennsylvania_data = merged_pennsylvania_data.drop(columns=no_need_2016, axis=1)
merged_pennsylvania_data = pd.merge(left=PA_data_2020, right=merged_pennsylvania_data, on='county_name')
merged_pennsylvania_data = merged_pennsylvania_data.drop(columns=no_need_2020, axis=1)
merged_pennsylvania_data.rename(columns={'total_votes': '2020 Vote Count'}, inplace=True)
merged_pennsylvania_data.rename(columns={'county_name': 'County Name'}, inplace=True)
merged_pennsylvania_data = merged_pennsylvania_data[['state_name', 'County Name', 'county_fips', '2016 Vote Count',
'2020 Vote Count', '2016 County Population', '2020 County Population']]
# converting 2016 County Population values from string to float
convert = merged_pennsylvania_data['2016 County Population'].tolist()
cnt = 0
for i in convert:
convert[cnt] = float("".join(i.split(",")))
cnt += 1
merged_pennsylvania_data = merged_pennsylvania_data.drop(columns='2016 County Population', axis=1)
merged_pennsylvania_data['2016 County Population'] = convert
convert = merged_pennsylvania_data['2020 County Population'].tolist()
cnt = 0
for i in convert:
convert[cnt] = float("".join(i.split(",")))
cnt += 1
merged_pennsylvania_data = merged_pennsylvania_data.drop(columns='2020 County Population', axis=1)
merged_pennsylvania_data['2020 County Population'] = convert
# Calculation of voter turnout percentage
voter_turnout_2016 = []
for i, r in merged_pennsylvania_data.iterrows():
voter_turnout_2016.append(r['2016 Vote Count']/r['2016 County Population'])
merged_pennsylvania_data['2016 Voter Turnout'] = voter_turnout_2016
voter_turnout_2020 = []
for i, r in merged_pennsylvania_data.iterrows():
voter_turnout_2020.append(r['2020 Vote Count']/r['2020 County Population'])
merged_pennsylvania_data['2020 Voter Turnout'] = voter_turnout_2020
merged_pennsylvania_data
# Creating a visualization for the voter turnout data
merged_pennsylvania_data.plot(x="County Name", y=["2016 Voter Turnout", "2020 Voter Turnout"], kind="bar",
title='Pennsylvania Voter Turnout per County', ylabel='Voter Turnout (decimal form)',
figsize=(25,25))
print('County Name:\t2016\t\t2020')
for i, r in merged_pennsylvania_data.iterrows():
print(r['County Name'] + ": " + str(round(r['2016 Voter Turnout']*100, 2)) + "%" + '\t\t'+ str(round(r['2020 Voter Turnout']*100, 2)) + "%")
wisconsinPop['county_name'] = wisconsinPop['County/State'].str.extract(r'([a-zA-Z]*\s*[a-zA-Z]*\s*County)')
wisconsinPop = wisconsinPop.drop(columns=['County/State'], axis=1)
cols = wisconsinPop.columns.tolist()
cols = cols[-1:] + cols[:-1]
wisconsinPop = wisconsinPop[cols]
# Creating merged table and dropping appropriate columns to make data look good
merged_wisconsin_data = pd.merge(left=WI_data_2016, right=wisconsinPop, on='county_name')
merged_wisconsin_data.rename(columns={'2019': '2020 County Population', '2016': '2016 County Population', 'state_abbr':
'State Abbrev.', 'total_votes': '2016 Vote Count'}, inplace=True)
no_need_2016 = ['votes_dem', 'votes_gop', 'per_dem', 'per_gop',
'diff', 'per_point_diff', 'combined_fips', 'percent_votes']
no_need_2020 = ['votes_dem', 'votes_gop', 'per_dem', 'per_gop', 'diff', 'per_point_diff', 'State Abbrev.', 'percent_votes']
merged_wisconsin_data = merged_wisconsin_data.drop(columns=no_need_2016, axis=1)
merged_wisconsin_data = pd.merge(left=WI_data_2020, right=merged_wisconsin_data, on='county_name')
merged_wisconsin_data = merged_wisconsin_data.drop(columns=no_need_2020, axis=1)
merged_wisconsin_data.rename(columns={'total_votes': '2020 Vote Count'}, inplace=True)
merged_wisconsin_data.rename(columns={'county_name': 'County Name'}, inplace=True)
merged_wisconsin_data = merged_wisconsin_data[['state_name', 'County Name', 'county_fips', '2016 Vote Count',
'2020 Vote Count', '2016 County Population', '2020 County Population']]
# converting 2016 County Population values from string to float
convert = merged_wisconsin_data['2016 County Population'].tolist()
cnt = 0
for i in convert:
convert[cnt] = float("".join(i.split(",")))
cnt += 1
merged_wisconsin_data = merged_wisconsin_data.drop(columns='2016 County Population', axis=1)
merged_wisconsin_data['2016 County Population'] = convert
convert = merged_wisconsin_data['2020 County Population'].tolist()
cnt = 0
for i in convert:
convert[cnt] = float("".join(i.split(",")))
cnt += 1
merged_wisconsin_data = merged_wisconsin_data.drop(columns='2020 County Population', axis=1)
merged_wisconsin_data['2020 County Population'] = convert
# Calculation of voter turnout percentage
voter_turnout_2016 = []
for i, r in merged_wisconsin_data.iterrows():
voter_turnout_2016.append(r['2016 Vote Count']/r['2016 County Population'])
merged_wisconsin_data['2016 Voter Turnout'] = voter_turnout_2016
voter_turnout_2020 = []
for i, r in merged_wisconsin_data.iterrows():
voter_turnout_2020.append(r['2020 Vote Count']/r['2020 County Population'])
merged_wisconsin_data['2020 Voter Turnout'] = voter_turnout_2020
merged_wisconsin_data
# Creating a visualization for the voter turnout data
merged_wisconsin_data.plot(x="County Name", y=["2016 Voter Turnout", "2020 Voter Turnout"], kind="bar",
title='Wisconsin Voter Turnout per County', ylabel='Voter Turnout (decimal form)',
figsize=(25,25))
print('County Name:\t2016\t\t2020')
for i, r in merged_wisconsin_data.iterrows():
print(r['County Name'] + ": " + str(round(r['2016 Voter Turnout']*100, 2)) + "%" + '\t\t'+ str(round(r['2020 Voter Turnout']*100, 2)) + "%")
As you can see from viewing these five double-bar graphs, each state experienced an increase in voter turnout for nearly every single county from the year 2016 to 2020. Looking at the most of the 2016 data (blue bars), voter turnout averaged around 40-50%, with a few counties throughout our distributions being nearly 60-70%. When turning out attention to the 2020 data (orange bars), there is clear and definitive increase, with the average now bumped up to around 60% voter turnout. This stark increase is paramount for our analysis of these states, as the higher turnout led to more votes and ultimately the flipping of the state from 2016 to 2020.
Visit County Choropleth Maps In Python to learn more.
Using the link above, we will now create maps for each of these five states that we are interested in.
First, we need to download some packages to run the visualization.
!pip install --upgrade plotly
!pip install --upgrade geopandas
!pip install --upgrade pyshp
!pip install --upgrade shapely
!pip install plotly-geo
!pip3 install plotly
!pip install flask
import plotly.figure_factory as ff
import numpy as np
import pandas as pd
from flask import Markup
values = (merged_arizona_data['2020 Voter Turnout']-merged_arizona_data['2016 Voter Turnout'])*100
fips = merged_arizona_data['county_fips']
endpts = list(np.mgrid[min(values):max(values):9j])
colorscale = [
'rgb(210,210,254)',
'rgb(193, 193, 193)',
'rgb(195, 196, 222)',
'rgb(144,148,194)',
'rgb(101,104,168)',
'rgb(65, 53, 132)',
'rgb(181, 137, 214)',
'rgb(153, 105, 199)',
'rgb(128, 79, 179)',
'rgb(106, 53, 156)',
'rgb(85, 37, 134)'
]
fig = ff.create_choropleth(
fips=fips, values=values, scope=['Arizona'], show_state_data=True,
colorscale=colorscale, binning_endpoints=endpts, round_legend_values=True,
plot_bgcolor='rgb(229,229,229)',
paper_bgcolor='rgb(229,229,229)',
legend_title='Increase in Percentage of Turnout',
county_outline={'color': 'rgb(255,255,255)', 'width': 0.5},
exponent_format=True,
)
fig.layout.template = None
fig.show()
from IPython.display import Image
Image(filename='arizona-county-map.jpg')
Here we have a map displaying which counties in Arizona had what percentage of voter turnout increase. Here is also a reference map of Arizona counties showing the largest cities from each counties and visually displaying the names of each county.
values = (merged_georgia_data['2020 Voter Turnout']-merged_georgia_data['2016 Voter Turnout'])*100
#values[150]=5.0
#values[151]=5.0
values = values.drop(index=[150,151])
fips = merged_georgia_data['county_fips']
fips = fips.drop(index=[150,151])
endpts = list(np.mgrid[min(values):max(values):9j])
colorscale = [
'rgb(210,210,254)',
'rgb(193, 193, 193)',
'rgb(195, 196, 222)',
'rgb(144,148,194)',
'rgb(101,104,168)',
'rgb(65, 53, 132)',
'rgb(181, 137, 214)',
'rgb(153, 105, 199)',
'rgb(128, 79, 179)',
'rgb(106, 53, 156)',
'rgb(85, 37, 134)'
]
fig = ff.create_choropleth(
fips=fips, values=values, scope=['GA'], show_state_data=True,
colorscale=colorscale, binning_endpoints=endpts, round_legend_values=True,
plot_bgcolor='rgb(229,229,229)',
paper_bgcolor='rgb(229,229,229)',
legend_title='Increase in Percentage of Turnout',
county_outline={'color': 'rgb(255,255,255)', 'width': 0.5},
exponent_format=True,
)
fig.layout.template = None
fig.show()
Image(filename='georgia-county-map.jpg')
Here we have a map displaying which counties in Georgia had what percentage of voter turnout increase. Here is also a reference map of Georgia counties showing the largest cities from each counties and visually displaying the names of each county. One part that is not shown is that Chattahoochee actually had a 3% decrease in voter turnout between the two years. Stewart county also had a marginal increase of 0.7% in voter turnout that could not be shown.
values = (merged_michigan_data['2020 Voter Turnout']-merged_michigan_data['2016 Voter Turnout'])*100
fips = merged_michigan_data['county_fips']
endpts = list(np.mgrid[min(values):max(values):9j])
colorscale = [
'rgb(210,210,254)',
'rgb(193, 193, 193)',
'rgb(195, 196, 222)',
'rgb(144,148,194)',
'rgb(101,104,168)',
'rgb(65, 53, 132)',
'rgb(181, 137, 214)',
'rgb(153, 105, 199)',
'rgb(128, 79, 179)',
'rgb(106, 53, 156)',
'rgb(85, 37, 134)'
]
fig = ff.create_choropleth(
fips=fips, values=values, scope=['MI'], show_state_data=True,
colorscale=colorscale, binning_endpoints=endpts, round_legend_values=True,
plot_bgcolor='rgb(229,229,229)',
paper_bgcolor='rgb(229,229,229)',
legend_title='Increase in Percentage of Turnout',
county_outline={'color': 'rgb(255,255,255)', 'width': 0.5},
exponent_format=True,
)
fig.layout.template = None
fig.show()
Image(filename='michigan-county-map.jpg')
Here we have a map displaying which counties in Michigan had what percentage of voter turnout increase. Here is also a reference map of Michigan counties showing the largest cities from each counties and visually displaying the names of each county.
values = (merged_pennsylvania_data['2020 Voter Turnout']-merged_pennsylvania_data['2016 Voter Turnout'])*100
fips = merged_pennsylvania_data['county_fips']
endpts = list(np.mgrid[min(values):max(values):9j])
colorscale = [
'rgb(210,210,254)',
'rgb(193, 193, 193)',
'rgb(195, 196, 222)',
'rgb(144,148,194)',
'rgb(101,104,168)',
'rgb(65, 53, 132)',
'rgb(181, 137, 214)',
'rgb(153, 105, 199)',
'rgb(128, 79, 179)',
'rgb(106, 53, 156)',
'rgb(85, 37, 134)'
]
fig = ff.create_choropleth(
fips=fips, values=values, scope=['PA'], show_state_data=True,
colorscale=colorscale, binning_endpoints=endpts, round_legend_values=True,
plot_bgcolor='rgb(229,229,229)',
paper_bgcolor='rgb(229,229,229)',
legend_title='Increase in Percentage of Turnout',
county_outline={'color': 'rgb(255,255,255)', 'width': 0.5},
exponent_format=True,
)
fig.layout.template = None
fig.show()
Image(filename='pennsylvania-county-map.png')
Here we have a map displaying which counties in Pennsylvania had what percentage of voter turnout increase. Here is also a reference map of PA counties showing the largest cities from each counties and visually displaying the names of each county.
values = (merged_wisconsin_data['2020 Voter Turnout']-merged_wisconsin_data['2016 Voter Turnout'])*100
fips = merged_wisconsin_data['county_fips']
endpts = list(np.mgrid[min(values):max(values):9j])
colorscale = [
'rgb(210,210,254)',
'rgb(193, 193, 193)',
'rgb(195, 196, 222)',
'rgb(144,148,194)',
'rgb(101,104,168)',
'rgb(65, 53, 132)',
'rgb(181, 137, 214)',
'rgb(153, 105, 199)',
'rgb(128, 79, 179)',
'rgb(106, 53, 156)',
'rgb(85, 37, 134)'
]
fig = ff.create_choropleth(
fips=fips, values=values, scope=['Wisonsin'], show_state_data=True,
colorscale=colorscale, binning_endpoints=endpts, round_legend_values=True,
plot_bgcolor='rgb(229,229,229)',
paper_bgcolor='rgb(229,229,229)',
legend_title='Increase in Percentage of Turnout',
county_outline={'color': 'rgb(255,255,255)', 'width': 0.5},
exponent_format=True,
)
fig.layout.template = None
fig.show()
Image(filename='wisconsin-county-map.jpg')
Here we have a map displaying which counties in Wisconsin had what percentage of voter turnout increase. Here is also a reference map of WI counties showing the largest cities from each counties and visually displaying the names of each county.
In conclusion, the increase of voter turnout proved to have a monumental impact on the result of the 2020 Presidential Election. In the five states that flipped from 2016 to 2020 that we chose to research: Arizona, Georgia, Michigan, Pennsylvania, and Wisconsin, our analysis shows that the voter turnout increased for each of these states and each of the counties inside the states. This increase in voter turnout may have caused these states to flip but for this hypothesis to be confirmed, more research must be done into this topic. By looking at which specific counties have the highest number of total votes and looking at the voter turnout increases, conclusions can be drawn about which specific counties caused the election to flip.