Ever wanted to display just a simple map of your Pandas data, but weren’t sure how? Me too. I quickly found that there are lots of instructions about how to generate complex, sophisticated maps — but I just needed a quick step-by-step, for a complete beginner with a simple Pandas dataframe. I’ll walk you through how I did it.
My dataset was a simple list of the 50 States with SAT participation rates from 2017. I wanted to create a map of the USA, with colored shading to distinguish high-participation states from low-participation states. I read the dataset into Pandas using
pd.read_csv. Like any good data science student, I did a google search and found that there are lots of options for creating maps from a Pandas dataframe. I looked into three options:
plot.lyhas an impressive line-up of visualizations and map functions. But it also costs $59 for a student membership, and I wasn’t there yet.
vincenthas an easy-to-follow walk-through to get you started using maps. But unfortunately, the library is deprecated. According to the repo, “There will be no more updates, closed issues, or PR merges for the Vincent project. Thanks so much to everyone who tried it or used it along the way.”
foliumwas recommended to me by one of my instructors at General Assembly. I ended up using this library; it was easy to access and quick to learn. Most importantly,
foliumprovides an easy-to-follow QuickStart document with clear examples.
One of the first terms I learned from this search was “choropleth map” — essentially, a map linked to a data variable. Apparently it’s from the Ancient Greek word χώρα (khṓra, “location”) + πλῆθος (plêthos, “a great number”). The word was first used in 1938 as choroplethe map, by American geographer John Kirtland Wright (got that tidbit from Wiktionary). Cool, huh?
The first step, of course, is to install and import the folium library:
pip install folium
import folium as folium
The next step was to read in a base map of the United States. I forked the entire folium repo on github, then cloned to my MacBook and read in the USA map as follows:
# Read in our map:
my_USA_map = '../data/us-states.json'
Following the QuickStart guide, I set the parameters for the map as longitude 48, latitude -102, which sets the map’s center someplace around northern Montana (the Fort Belknap Reservation, actually). I played around with the location settings to see how different settings might refocus the map, but ultimately stayed with Montana. Seemed like a great place to start.
map = folium.Map(location=[48, -102], zoom_start=3)
After some exploration, I found that the code I had forked from folium wanted a dataset with only the 50 states, so I dropped Washington DC from my dataframe (I’m located in DC, but I imagine the rest of the country won’t miss us). I only needed two columns — the state, and its participation rate on the SAT. But the starter code expected the states to be tagged by their abbreviations, so I had to update the
states column in my dataframe by replacing each state’s name with its abbreviation (so Arizona becomes AZ).Once I’d made these changes I was able to link my dataframe to the map. I used the following code from Folium’s quickstart guide:
However, even though I’d followed the quickstart guide pretty carefully, this code didn’t work. I got the following error message:
'Map' object has no attribute 'json'
I searched the
folium repo until I found that somebody else had experienced the same problem: The .geo_json() method is broken, and has been replaced by new syntax. Rather than
map.geo_json(geo_path=, the correct code is now
map.choropleth(geo_data=. Once I made this change, the rest of the code I’d taken from the quickstart guide ran smoothly for me. By playing around with the starter code, I found I could change the darkness of the fill and lines, and update the name of the legend. I also discovered you can toggle the color scheme by replacing
BuPu instead. My final code looked like this:
fill_color='YlGn', fill_opacity=0.7, line_opacity=0.2,
legend_name='Participation Rate (%)')
My map, showing the lower 48 states by SAT participation level, looked like this:
This state-level participation data, taken from the SAT website, becomes a little more interesting when we compare it to participation data from the SAT’s major competitor, the ACT test (data for both tests are from 2017). Running the ACT participation rates for the 50 states through the identical process as the SAT, an interesting shadow effect emerges: states with high SAT participation tend to have low ACT participation, and vice versa (it’s kind of a heartland vs. coasts pattern, if you will):
This result makes sense when you consider that most high school students will take only one of the two tests, but not both. In addition, most state departments of education will tend to favor one test, either by subsidizing the cost, offering it on campus during school hours, or making in mandatory for all public school students in the state. An interesting pair of outliers to this pattern are the two Carolinas: while both have 100% participation on the ACT, they also have 50% participation on the SAT.
This was a pretty simple first step into the world of choropleth maps using Pandas dataframes and Folium. What initially seemed like an insurmountable task quickly became manageable when I discovered Folium’s clear documentation and starter code. I look forward to increasing my level of skill and familiarity with this exciting tool, and producing even more detailed maps in the near future.