Sep 20, 2021
# Let's setup the imports we'll need first
import numpy as np
from matplotlib import pyplot as plt
import pandas as pd
Both terms are very common in Python geospatial software.
Actually several files with the same common prefix
Mandatory files:
And many optional files for documentation, projection information, etc.
Let's take a look at an example shapefile:
We'll use the %ls
command to list out all of the files in an example shapefile in the data/
folder
%ls "data/ne_110m_admin_0_countries/"
GitHub example from the data/
directory: Philadelphia ZIP Codes
geopandas
provides a simple, intuitive for the main types of geospatial vector file formats
Example: Let's load a shape file of countries in the world...
import geopandas as gpd
We can use the read_file() function to read shapefiles and GeoJSON files.
# Read the shape file, giving the name of the directory
countries = gpd.read_file("./data/ne_110m_admin_0_countries")
countries.head()
type(countries)
GeoDataFrame
?¶Just like a DataFrame
but with a geometry
column
# Print out the first 10 entires of the "geometry" column
countries['geometry'].head(n=10)
pandas
...¶Calculate the total world population:
countries['pop_est'].sum()/1e9 # In billions
Calculate the total population on each continent:
grouped = countries.groupby('continent')
grouped
groupby()
does not return a DataFrame
— you need to call sum()
, mean()
, etc, or apply()
a function.
# Sum population on each continent
pop_by_continent = grouped['pop_est'].sum()
# Sort values
pop_by_continent.sort_values(ascending=False, inplace=True)
# Output sorted values from cell
pop_by_continent/1e9
Filter the data frame based on a boolean selection:
# Is the country name USA?
is_USA = countries['name']=='United States of America'
# Get the row with USA
USA = countries.loc[is_USA]
USA
squeeze()
function¶It does just one it sounds like: if you have a DataFrame with only one row, it will "squeeze" the row dimension by removing it, returning just a Series object:
# Squeeze
USA = USA.squeeze()
# Print out the type
print("The type of USA is: ", type(USA))
# Output
USA
The simple features (Lines, Points, Polygons) are implemented by the shapely library
type(USA.geometry)
Jupyter notebook renders shapely
geometries automatically:
# a mini USA
USA.geometry
geopandas
handle coordinate systems and map projections?¶A coordinate reference system (CRS) relates the position of a geometry object on the spherical earth to its two-dimensional coordinates.
A GeoDataFrame
or GeoSeries
has a .crs
attribute which specifies the coordinate reference system.
countries.crs
x
and y
are longitude and latitude. Use the plot()
function to get a quick and dirty plot of all of the geometry features.
Note: the plot()
returns the current maplotlib axes, allowing you to format the chart after plotting.
# Create a figure and axes
fig, ax = plt.subplots(figsize=(10, 6))
# Plot the countries on our axes
ax = countries.plot(ax=ax, facecolor="none", edgecolor="black")
# Add a title
ax.set_title("Equirectangular Projection")
matplotlib
and cartopy
are combined to make geo-aware plots
import cartopy.crs as ccrs
# Initialize the EPSG:4326 CRS object
wgs84 = ccrs.PlateCarree()
# Create a geo-aware axes using the "projetion" keyword
fig, ax = plt.subplots(figsize=(10, 6), subplot_kw={"projection": wgs84})
# Print out the type of the axes for info purposes
print("The type of the axes is: ", type(ax))
# Add the geometry shapes
a x.add_geometries(countries["geometry"], crs=wgs84, facecolor="none", edgecolor="black")
# Add a title
ax.set_title("Equirectangular Projection");
See the Geopandas documentation for more examples: Plotting with CartoPy and GeoPandas
Use the to_crs()
function. The most well-known projections can be specified by their EPSG code.
Geopandas documentation on re-projecting: Managing Projections
# Remove Antartica, as the Mercator projection
# cannot deal with the poles
no_antartica = countries.loc[(countries['name'] != "Antarctica")]
# Two ways to specify the EPSG code
countries_mercator = no_antartica.to_crs(epsg=3395)
# Alternatively:
#countries_mercator = no_antartica.to_crs("EPSG:3395")
countries_mercator.head()
Note: the magnitude of the values in the geometry column changed! A quick and easy way to tell if the re-projection worked properly!
The easy way...with geopandas built-in plot()
function
# Initialize the figure and axes
fig, ax = plt.subplots(figsize=(10, 6))
# Use built-in plot() of the GeoDataFrame
ax = countries_mercator.plot(ax=ax,
facecolor="none",
edgecolor="black")
# Add a title
ax.set_title("Mercator Projection");
With cartopy and matplotlib
# Initialize the CRS object
crs = ccrs.epsg(3395) # or crs = ccrs.Mercator()
# Create a geo-aware axes using the "projetion" keyword
fig, ax = plt.subplots(figsize=(10, 6),
subplot_kw={"projection": crs})
# Add the geometry shapes
ax.add_geometries(countries_mercator['geometry'],
crs=crs, facecolor='none', edgecolor='black')
# Add a title
ax.set_title('Mercator Projection');
When you need more customizable, advanced plots.
Nearly anything that matplotlib
can do can be plotted on a cartopy
GeoAxes
. Plotting directly with matplotlib
allows you to take full advantage of matplotlib
's functionality.
We'll use the provided City_Limits
shape file in the data/
folder
city_limits = gpd.read_file('./data/City_Limits')
city_limits
Use the .crs
attribute to find out!
city_limits.crs
# Create our figure and axes
fig, ax = plt.subplots(figsize=(5, 5))
# Plot
city_limits.plot(ax=ax, facecolor="none", edgecolor="black")
# Format
ax.set_title("Equirectangular")
ax.set_axis_off() # This will remove the axes completely
ax.set_aspect("equal") # This forces an equal aspect ratio
This is not what Philadelphia looks like..
Let's try EPSG=3857 instead:
# Create the figure
fig, ax = plt.subplots(figsize=(5, 5))
# Convert to EPSG:3857
city_limits_3857 = city_limits.to_crs(epsg=3857)
# Plot and format
city_limits_3857.plot(ax=ax, facecolor="none", edgecolor="black")
ax.set_title("Web Mercator")
ax.set_axis_off()
ax.set_aspect("equal");
Important: the equirectangular CRS (EPSG=4326) is often used by default and will make cities appear wider and flatter than they really are
Use the to_file()
function and specify the driver.
# ESRI shape file
city_limits_3857.to_file("./data/city_limits_3857",
driver='ESRI Shapefile')
# GeoJSON is also an option
city_limits_3857.to_file("./data/city_limits_3857.geojson",
driver='GeoJSON')
Yes, but reading requires more work...
# save a csv file
city_limits_3857.to_csv("./data/city_limits_3857.csv", index=False)
df = pd.read_csv("./data/city_limits_3857.csv")
df.head()
But, the "geometry" column is just stored as a string...it's not a shapely Polygon
type(df.geometry)
shapely
to parse the string version of the polygons¶from shapely import wkt
# wkt.loads will convert from string to Polygon object
df['geometry'] = df['geometry'].apply(wkt.loads)
df.geometry.iloc[0]
We can initialize the GeoDataFrame directly from a DataFrame but we need to specify two things:
In this case, the geometry column was saved in Web Mercator EPSG=3857
# Make specifying the name of the geometry column and CRS
gdf= gpd.GeoDataFrame(df, geometry='geometry', crs="EPSG:3857")
# Now plot
fig, ax = plt.subplots(figsize=(5,5))
ax = gdf.plot(ax=ax, facecolor='none', edgecolor='black')
ax.set_axis_off()
ax.set_aspect("equal")
The tilt should be a bit more obvious now...
ax = (
gdf.to_crs(epsg=4326)
.plot(facecolor='none', edgecolor='black')
)
ax.set_axis_off()
ax.set_aspect("equal")
plt.subplots()
here to create a figure/axes – I let geopandas automatically make oneto_crs()
and .plot()
functions in one line.plot()
function returns the axes object that geopandas used to plot — this lets you customizes the axes after plotting# Load some cities data
cities = gpd.read_file("./data/ne_110m_populated_places")
(Image by Krauss, CC BY-SA 3.0)
All of these operations are available as functions of a GeoDataFrame
.
# Select the Point representing New York City
new_york = cities.loc[cities['name'] == 'New York', 'geometry'].squeeze()
new_york
type(new_york)
countries.contains(new_york)
# Find the country that contains New York
countries.loc[countries.contains(new_york)]
# Get the geometry column of the country containing NYC
USA = countries.loc[countries.contains(new_york)].squeeze().geometry
USA
The .loc[]
function can take the index selector as the first argument, and the name of a column as a second argument (separated by a comma)
type(USA)
# Is New York within the USA?
new_york.within(USA)
The different functions for checking spatial relationships:
equals
contains
crosses
disjoint
intersects
overlaps
touches
within
covers
See the shapely documentation for an overview of these methods.
SPATIAL JOIN = merging attributes from two geometry layers based on their spatial relationship
Different parts of this operations:
In this case, we want to join the cities
dataframe, containing Point
geometries, with the information of the countries
dataframe, containing Polygon
geometries.
To match cities with countries, we'll use the within
spatial relationship.
The geopandas.sjoin()
function performs this operation:
joined = gpd.sjoin(cities, countries, op='within', how='left')
joined.head()
cities_in_italy = joined.loc[joined['name_right'] == 'Italy']
cities_in_italy
# Extract Italy
italy = countries.loc[countries['name']=='Italy']
# Plot
fig, ax = plt.subplots(figsize=(8,8))
italy.plot(ax=ax, facecolor='none', edgecolor='black')
ax.set_axis_off()
ax.set_aspect("equal")
# Plot the first city in the joined data frame (Vatican City)
# Use the same axes by passing in the ax=ax keyword
ax = cities_in_italy.plot(ax=ax, color='red')
We can also perform the join()
operation on the geometries rather than just combining attributes.
The overlay()
function combines geometries, e.g. by taking the intersection of the geometries.
africa = countries.loc[countries['continent'] == 'Africa']
## What crs?
africa.crs
# Let's transform to a CRS that uses meters
# instead of degrees (EPSG=3857)
africa = africa.to_crs(epsg=3857)
africa.crs
fig, ax = plt.subplots(figsize=(8,8))
africa.plot(ax=ax, facecolor='#b9f2b1')
ax.set_axis_off()
ax.set_aspect("equal")
# Important CRS needs to match!
cities = cities.to_crs(epsg=3857)
# Create a copy of the GeoDataFrame
buffered_cities = cities.copy()
# Add a buffer region of 250 km around all cities
buffered_cities['geometry'] = buffered_cities.buffer(250e3)
fig, ax = plt.subplots(figsize=(8, 8))
# Calculate the difference of the geometry sets
diff = gpd.overlay(africa, buffered_cities, how='difference')
# Plot
diff.plot(facecolor="#b9f2b1", ax=ax)
ax.set_axis_off()
ax.set_aspect("equal")
# Data attributes are the same as the first data frame (africa)
# with an updated geometry column
diff.head()
fig, ax = plt.subplots(figsize=(8, 8))
# The intersection of the geometry sets
intersection = gpd.overlay(africa, buffered_cities, how='intersection')
# Plot
intersection.plot(ax=ax, facecolor="#b9f2b1")
ax.set_axis_off()
ax.set_aspect("equal")
Load 311 requests in Philadelphia from the data/
directory.
Source: OpenDataPhilly
# Load the data from a CSV file into a pandas DataFrame
requests = pd.read_csv('./data/public_cases_fc_2020.csv.tar.gz')
print("number of requests = ", len(requests))
requests.head()
Remove the requests missing lat/lon coordinates
requests = requests.dropna(subset=['lat', 'lon'])
Create Point objects for each lat
and lon
combination.
We can use the helper utility function: geopandas.points_from_xy()
requests['Coordinates'] = gpd.points_from_xy(requests['lon'], requests['lat'])
requests['Coordinates'].head()
Now, convert to a GeoDataFrame.
Important
ESPG:4326
Since we're only using a few EPSG codes in this course, you can usually tell what the CRS is by looking at the values in the Point() objects.
Philadelphia has a latitude of about 40 deg and longitude of about -75 deg.
Our data must be in the usual lat/lng EPSG=4326.
requests = gpd.GeoDataFrame(requests,
geometry="Coordinates",
crs="EPSG:4326")