Week 8: Raster Data in the Wild

Oct 25, 2021

The roadmap

  • Last week Big data: dask and datashader
  • Today and next several weeks: "geo data science in the wild"
    • Profiling useful tools for real-life applications
    • Predictive analytics and machine learning
    • A little bit more "lab" based
  • Ending with couple of weeks of web-based data visualization

Housekeeping

  • HW #4 due a week from today
  • HW #5 assigned this Wednesday
    • Covers web scraping and big data
    • Due in two weeks (11/10)
  • Two more homeworks (#6 and #7): HW #7 is required + final project
  • We'll review final project guidelines next week, written proposal due in about a month
  • Final project will be due near the end of the final period
In [1]:
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import geopandas as gpd

Week 8 Agenda: Raster data analysis with the holoviz ecosystem

Two case studies:

  • Using satellite imagery to detect changes in lake volume
  • Detecting urban heat islands in Philadelphia

The decline of the world's saline lakes

  • A 2017 study that looked at the volume decline of many of the world's saline lakes
  • Primarily due to water diversion and climate change
  • Estimate the amount of inflow required to sustain these ecosystems
  • Copy of the study available in this week's repository

Some examples...

The Aral Sea in Kazakhstan

In 2000

2018

Lake Urmia in Iran

1998

2011

Today: Walker Lake

1988

2017

Let's analyze this in Python!

In [2]:
import intake
import xarray as xr
import cartopy.crs as ccrs

# hvplot
import hvplot.xarray
import hvplot.pandas
In [3]:
import holoviews as hv
import geoviews as gv
from geoviews.tile_sources import EsriImagery

First, let's use intake to load the data

Catalog file: https://github.com/pyviz-topics/EarthML/blob/master/examples/catalog.yml

Source: EarthML

In [4]:
url = 'https://raw.githubusercontent.com/pyviz-topics/EarthML/master/examples/catalog.yml'
cat = intake.open_catalog(url)
In [5]:
list(cat)
Out[5]:
['landsat_5_small',
 'landsat_8_small',
 'landsat_5',
 'landsat_8',
 'google_landsat_band',
 'amazon_landsat_band',
 'fluxnet_daily',
 'fluxnet_metadata',
 'seattle_lidar']

We'll focus on the Landsat 5 and 8 small datasets.

These are "small" snapshots around Walker Lake, around cut out of the larger Landsat dataset.

In [6]:
landsat_5 = cat.landsat_5_small()
In [7]:
landsat_5
landsat_5_small:
  args:
    chunks:
      band: 1
      x: 50
      y: 50
    concat_dim: band
    storage_options:
      anon: true
    urlpath: s3://earth-data/landsat/small/LT05_L1TP_042033_19881022_20161001_01_T1_sr_band{band:d}.tif
  description: Small version of Landsat 5 Surface Reflectance Level-2 Science Product.
  driver: intake_xarray.raster.RasterIOSource
  metadata:
    cache:
    - argkey: urlpath
      regex: earth-data/landsat
      type: file
    catalog_dir: https://raw.githubusercontent.com/pyviz-topics/EarthML/master/examples
    plots:
      band_image:
        dynamic: false
        groupby: band
        kind: image
        rasterize: true
        width: 400
        x: x
        y: y

Intake let's you define "default" plots for a dataset

Leveraging the power of hvplot under the hood...

In [8]:
landsat_5.hvplot.band_image()
Out[8]:

What's happening under the hood?

In [9]:
landsat_5.hvplot.image(
    x="x",
    y="y",
    groupby="band",
    rasterize=True,
    frame_width=400,
    frame_height=400,
    geo=True,
    crs=32611,
)
Out[9]:

We can do the same for the Landsat 8 data

In [10]:
landsat_8 = cat.landsat_8_small()
In [11]:
landsat_8.hvplot.image(
    x="x",
    y="y",
    groupby="band",
    rasterize=True,
    frame_width=400,
    frame_height=400,
    geo=True,
    crs=32611,
)
Out[11]:

We can use dask to read the data

This will return an xarray DataArray where the values are stored as dask arrays.

In [12]:
landsat_5_da = landsat_5.to_dask()
landsat_8_da = landsat_8.to_dask()
In [13]:
landsat_5_da
Out[13]:
<xarray.DataArray (band: 6, y: 300, x: 300)>
array([[[ 640.,  842.,  864., ..., 1250.,  929., 1111.],
        [ 796.,  774.,  707., ..., 1136.,  906., 1065.],
        [ 975.,  707.,  908., ..., 1386., 1249., 1088.],
        ...,
        [1243., 1202., 1160., ..., 1132., 1067.,  845.],
        [1287., 1334., 1292., ...,  801.,  934.,  845.],
        [1550., 1356., 1314., ..., 1309., 1636., 1199.]],

       [[ 810., 1096., 1191., ..., 1749., 1266., 1411.],
        [1096., 1048.,  905., ..., 1556., 1217., 1411.],
        [1286., 1000., 1286., ..., 1749., 1604., 1411.],
        ...,
        [1752., 1565., 1566., ..., 1502., 1456., 1078.],
        [1752., 1799., 1706., ...,  983., 1172., 1077.],
        [1893., 1753., 1754., ..., 1736., 2250., 1736.]],

       [[1007., 1345., 1471., ..., 2102., 1462., 1719.],
        [1260., 1259., 1175., ..., 1847., 1419., 1719.],
        [1555., 1175., 1555., ..., 2059., 1889., 1760.],
        ...,
...
        ...,
        [2429., 2138., 2041., ..., 2175., 1885., 1301.],
        [2381., 2333., 2382., ..., 1204., 1495., 1301.],
        [2478., 2041., 2333., ..., 2755., 3431., 2223.]],

       [[1819., 2596., 2495., ..., 2429., 1785., 2023.],
        [2259., 2359., 1885., ..., 2158., 1684., 1921.],
        [2865., 2291., 2664., ..., 2057., 1955., 2057.],
        ...,
        [3081., 2679., 2612., ..., 2499., 2098., 1395.],
        [2779., 2544., 2779., ..., 1429., 1596., 1496.],
        [3183., 2309., 2679., ..., 3067., 3802., 2665.]],

       [[1682., 2215., 2070., ..., 2072., 1440., 1780.],
        [1876., 1973., 1633., ..., 1926., 1586., 1635.],
        [2409., 1924., 2215., ..., 1829., 1780., 1829.],
        ...,
        [2585., 2296., 2296., ..., 2093., 1710., 1134.],
        [2393., 2344., 2489., ..., 1182., 1374., 1326.],
        [2826., 2007., 2393., ..., 2860., 3724., 2333.]]])
Coordinates:
  * band     (band) int64 1 2 3 4 5 7
  * y        (y) float64 4.309e+06 4.309e+06 4.309e+06 ... 4.264e+06 4.264e+06
  * x        (x) float64 3.324e+05 3.326e+05 3.327e+05 ... 3.771e+05 3.772e+05
Attributes:
    transform:      (150.0, 0.0, 332325.0, 0.0, -150.0, 4309275.0)
    crs:            +init=epsg:32611
    res:            (150.0, 150.0)
    is_tiled:       0
    nodatavals:     (nan,)
    scales:         (1.0,)
    offsets:        (0.0,)
    AREA_OR_POINT:  Area

Use ".values" to convert to a numpy array

In [14]:
landsat_5_da.values
Out[14]:
array([[[ 640.,  842.,  864., ..., 1250.,  929., 1111.],
        [ 796.,  774.,  707., ..., 1136.,  906., 1065.],
        [ 975.,  707.,  908., ..., 1386., 1249., 1088.],
        ...,
        [1243., 1202., 1160., ..., 1132., 1067.,  845.],
        [1287., 1334., 1292., ...,  801.,  934.,  845.],
        [1550., 1356., 1314., ..., 1309., 1636., 1199.]],

       [[ 810., 1096., 1191., ..., 1749., 1266., 1411.],
        [1096., 1048.,  905., ..., 1556., 1217., 1411.],
        [1286., 1000., 1286., ..., 1749., 1604., 1411.],
        ...,
        [1752., 1565., 1566., ..., 1502., 1456., 1078.],
        [1752., 1799., 1706., ...,  983., 1172., 1077.],
        [1893., 1753., 1754., ..., 1736., 2250., 1736.]],

       [[1007., 1345., 1471., ..., 2102., 1462., 1719.],
        [1260., 1259., 1175., ..., 1847., 1419., 1719.],
        [1555., 1175., 1555., ..., 2059., 1889., 1760.],
        ...,
        [2090., 1840., 1798., ..., 1828., 1703., 1242.],
        [2048., 2049., 2008., ..., 1074., 1326., 1158.],
        [2216., 1965., 2049., ..., 2202., 2783., 1994.]],

       [[1221., 1662., 1809., ..., 2401., 1660., 1957.],
        [1564., 1465., 1367., ..., 2105., 1610., 1907.],
        [2004., 1465., 1955., ..., 2302., 2055., 2006.],
        ...,
        [2429., 2138., 2041., ..., 2175., 1885., 1301.],
        [2381., 2333., 2382., ..., 1204., 1495., 1301.],
        [2478., 2041., 2333., ..., 2755., 3431., 2223.]],

       [[1819., 2596., 2495., ..., 2429., 1785., 2023.],
        [2259., 2359., 1885., ..., 2158., 1684., 1921.],
        [2865., 2291., 2664., ..., 2057., 1955., 2057.],
        ...,
        [3081., 2679., 2612., ..., 2499., 2098., 1395.],
        [2779., 2544., 2779., ..., 1429., 1596., 1496.],
        [3183., 2309., 2679., ..., 3067., 3802., 2665.]],

       [[1682., 2215., 2070., ..., 2072., 1440., 1780.],
        [1876., 1973., 1633., ..., 1926., 1586., 1635.],
        [2409., 1924., 2215., ..., 1829., 1780., 1829.],
        ...,
        [2585., 2296., 2296., ..., 2093., 1710., 1134.],
        [2393., 2344., 2489., ..., 1182., 1374., 1326.],
        [2826., 2007., 2393., ..., 2860., 3724., 2333.]]])

How about metadata?

Use the attrs attribute

In [15]:
landsat_5_da.attrs
Out[15]:
{'transform': (150.0, 0.0, 332325.0, 0.0, -150.0, 4309275.0),
 'crs': '+init=epsg:32611',
 'res': (150.0, 150.0),
 'is_tiled': 0,
 'nodatavals': (nan,),
 'scales': (1.0,),
 'offsets': (0.0,),
 'AREA_OR_POINT': 'Area'}

Important: the CRS

In [16]:
landsat_5_da.crs
Out[16]:
'+init=epsg:32611'
  • EPSG 32611 is the default CRS for Landsat data
  • Units are meters

Evidence of shrinkage?

  • We want to compare the images across time.
  • Data for Landsat 8 is from 2017
  • Data for Landat 5 is from 1988

Problem: they appear to cover different areas

First: Let's plot RGB images of the two datasets

See week 4's lecture for a reminder!

Hints

  • You will want to use the earthpy package to do the plotting.
  • You can use the data.sel(band=bands) syntax to select specific bands from the dataset, where bands is a list of the desired bands to be selected.

Important notes

  • For Landsat 5, the RGB bands are bands 3, 2, and 1, respectively.
  • For Landsat 8, the RGB bands are bands 4, 3, and 2, respectively.
  • Reference: Landsat 5 and Landsat 8
In [17]:
import earthpy.plot as ep

Landsat 8 color image

In [18]:
# Get the RGB data from landsat 8 dataset as a numpy array
rgb_data_8 = landsat_8_da.sel(band=[4,3,2]).values

# # Initialize
fig, ax = plt.subplots(figsize=(10,10))

# # Plot the RGB bands
ep.plot_rgb(rgb_data_8, rgb=(0, 1, 2), ax=ax);

Landsat 5 color image

In [19]:
# Get the RGB data from landsat 5 dataset as a numpy array
rgb_data_5 = landsat_5_da.sel(band=[3,2,1]).values

# # Initialize
fig, ax = plt.subplots(figsize=(10,10))

# # Plot the RGB bands
ep.plot_rgb(rgb_data_5, rgb=(0, 1, 2), ax=ax);

Can we trim these to the same areas?

Yes, we can use xarray builtin selection functionality!

The Landsat 5 images is more zoomed in, so let's trim to the Landsat 8 data to this range

Take advantage of the "coordinate" arrays provided by xarray:

In [20]:
landsat_5_da.x
Out[20]:
<xarray.DataArray 'x' (x: 300)>
array([332400., 332550., 332700., ..., 376950., 377100., 377250.])
Coordinates:
  * x        (x) float64 3.324e+05 3.326e+05 3.327e+05 ... 3.771e+05 3.772e+05
In [21]:
landsat_5_da.y
Out[21]:
<xarray.DataArray 'y' (y: 300)>
array([4309200., 4309050., 4308900., ..., 4264650., 4264500., 4264350.])
Coordinates:
  * y        (y) float64 4.309e+06 4.309e+06 4.309e+06 ... 4.264e+06 4.264e+06

Remember: These coordinates are in units of meters!

Let's get the bounds of the Landsat 5 image:

In [22]:
# x bounds
xmin = landsat_5_da.x.min()
xmax = landsat_5_da.x.max()

# y bounds
ymin = landsat_5_da.y.min()
ymax = landsat_5_da.y.max()

Slicing with xarray

We can use Python's built-in slice() function to slice the data's coordinate arrays and select the subset of the data we want.

More info: http://xarray.pydata.org/en/stable/indexing.html

Slicing in Python

  • A slice object is used to specify how to slice a sequence.
  • You can specify where to start the slicing, and where to end. You can also specify the step, which allows you to e.g. slice only every other item.

Syntax: slice(start, end, step)

An example

In [23]:
letters = ["a", "b", "c", "d", "e", "f", "g", "h"]

letters[0:3]
Out[23]:
['a', 'b', 'c']
In [24]:
letters[slice(0, 3)]
Out[24]:
['a', 'b', 'c']

Back to xarray and the Landsat data...

We can use the .sel() function to slice our x and y coordinate arrays!

In [25]:
# slice the x and y coordinates
slice_x = slice(xmin, xmax)
slice_y = slice(ymax, ymin) # IMPORTANT: y coordinate array is in descending order

# Use the .sel() to slice
landsat_8_trimmed = landsat_8_da.sel(x=slice_x, y=slice_y)

Important: the y coordinate is stored in descending order, so the slice should be ordered the same way (from ymax to ymin)

In [26]:
landsat_8_da.y
Out[26]:
<xarray.DataArray 'y' (y: 286)>
array([4320900., 4320690., 4320480., ..., 4261470., 4261260., 4261050.])
Coordinates:
  * y        (y) float64 4.321e+06 4.321e+06 4.32e+06 ... 4.261e+06 4.261e+06

Let's look at the trimmed data

In [27]:
landsat_8_trimmed.shape
Out[27]:
(7, 214, 213)
In [28]:
landsat_8_da.shape
Out[28]:
(7, 286, 286)
In [29]:
landsat_8_trimmed
Out[29]:
<xarray.DataArray (band: 7, y: 214, x: 213)>
array([[[ 730.,  721.,  703., ...,  970., 1059.,  520.],
        [ 700.,  751.,  656., ...,  835., 1248.,  839.],
        [ 721.,  754.,  796., ..., 1065., 1207., 1080.],
        ...,
        [1022.,  949.,  983., ...,  516.,  598.,  676.],
        [ 976.,  757.,  769., ...,  950.,  954.,  634.],
        [ 954., 1034.,  788., ..., 1133.,  662., 1055.]],

       [[ 874.,  879.,  858., ..., 1157., 1259.,  609.],
        [ 860.,  919.,  814., ...,  968., 1523.,  983.],
        [ 896.,  939.,  982., ..., 1203., 1412., 1292.],
        ...,
        [1243., 1157., 1212., ...,  604.,  680.,  805.],
        [1215.,  953.,  949., ..., 1095., 1110.,  755.],
        [1200., 1258.,  978., ..., 1340.,  758., 1189.]],

       [[1181., 1148., 1104., ..., 1459., 1633.,  775.],
        [1154., 1223., 1093., ..., 1220., 1851., 1345.],
        [1198., 1258., 1309., ..., 1531., 1851., 1674.],
        ...,
...
        ...,
        [2652., 2459., 2649., ..., 1190., 1299., 1581.],
        [2547., 1892., 2212., ..., 2284., 2416., 1475.],
        [2400., 2627., 2405., ..., 2469., 1579., 2367.]],

       [[3039., 2806., 2877., ..., 1965., 2367., 1203.],
        [2779., 2799., 2474., ..., 1685., 2339., 1637.],
        [2597., 2822., 2790., ..., 2030., 2587., 2116.],
        ...,
        [3144., 2892., 3168., ..., 1453., 1594., 1765.],
        [3109., 2405., 2731., ..., 2804., 3061., 1653.],
        [2518., 3110., 3144., ..., 2801., 2051., 2770.]],

       [[2528., 2326., 2417., ..., 1748., 2139., 1023.],
        [2260., 2238., 1919., ..., 1519., 2096., 1448.],
        [2041., 2226., 2247., ..., 1848., 2380., 1973.],
        ...,
        [2661., 2423., 2556., ..., 1225., 1335., 1469.],
        [2573., 1963., 2091., ..., 2479., 2570., 1393.],
        [2191., 2525., 2504., ..., 2658., 1779., 2514.]]])
Coordinates:
  * band     (band) int64 1 2 3 4 5 6 7
  * y        (y) float64 4.309e+06 4.309e+06 4.309e+06 ... 4.265e+06 4.264e+06
  * x        (x) float64 3.326e+05 3.328e+05 3.33e+05 ... 3.769e+05 3.771e+05
Attributes:
    transform:      (210.0, 0.0, 318195.0, 0.0, -210.0, 4321005.0)
    crs:            +init=epsg:32611
    res:            (210.0, 210.0)
    is_tiled:       0
    nodatavals:     (nan,)
    scales:         (1.0,)
    offsets:        (0.0,)
    AREA_OR_POINT:  Area

Plot the trimmed Landsat 8 data

In [30]:
# Get the trimmed landsat 8 RGB data as a numpy array
rgb_data_8 = landsat_8_trimmed.sel(band=[4,3,2]).values

# # Initialize
fig, ax = plt.subplots(figsize=(10,10))

# # Plot the RGB bands
ep.plot_rgb(rgb_data_8, rgb=(0, 1, 2), ax=ax);

Plot the original Landsat 5 data, too

In [31]:
# Select the RGB data as a numpy array
rgb_data_5 = landsat_5_da.sel(band=[3,2,1]).values

# Initialize the figure
fig, ax = plt.subplots(figsize=(10,10))

# Plot the RGB bands
ep.plot_rgb(rgb_data_5, rgb=(0, 1, 2), ax=ax);

Some evidence of shrinkage...but can we do better?

Yes! We'll use the change in the NDVI over time to detect change in lake volume

Calculate the NDVI for the Landsat 5 (1988) and Landsat 8 (2017) datasets

Remember:

  • NDVI = (NIR - Red) / (NIR + Red)
  • You can once again use the .sel() function to select certain bands from the datasets

For Landsat 5: NIR = band 4 and Red = band 3

For Landsat 8: NIR = band 5 and Red = band 4

NDVI 1988

In [32]:
NIR_1988 = landsat_5_da.sel(band=4)
RED_1988 = landsat_5_da.sel(band=3)

NDVI_1988 = (NIR_1988 - RED_1988) / (NIR_1988 + RED_1988)

NDVI 2017

In [33]:
NIR_2017 = landsat_8_da.sel(band=5)
RED_2017 = landsat_8_da.sel(band=4)

NDVI_2017 = (NIR_2017 - RED_2017) / (NIR_2017 + RED_2017)

The difference between 2017 and 1988

  • Take the difference between the 2017 NDVI and the 1988 NDVI
  • Use the hvplot.image() function to show the difference
  • A diverging palette, like coolwarm, is particularly useful for this situation
In [34]:
diff = NDVI_2017 - NDVI_1988

diff.hvplot.image(
    x="x",
    y="y",
    frame_height=400,
    frame_width=400,
    cmap="coolwarm",
    clim=(-1, 1),
    geo=True,
    crs=32611,
)
Out[34]:
In [35]:
NDVI_1988.shape
Out[35]:
(300, 300)
In [36]:
NDVI_2017.shape
Out[36]:
(286, 286)
In [37]:
diff.shape
Out[37]:
(42, 43)

What's going on here? Why so pixelated?

Two issues:

  • Different x/y bounds
  • Different resolutions
In [38]:
# Different x/y ranges!
print(NDVI_1988.x[0].values)
print(NDVI_2017.x[0].values)
332400.0
318300.0
In [39]:
# Different resolutions
print((NDVI_1988.x[1] - NDVI_1988.x[0]).values)
print((NDVI_2017.x[1] - NDVI_2017.x[0]).values)
150.0
210.0

How to improve?

...we can use xarray to put the data on the same grid

First, calculate a bounding box around the center of the data

In [40]:
# The center lat/lng values in EPSG = 4326
# I got these points from google maps 
x0 = -118.7081
y0 = 38.6942

Let's convert these coordinates to the sames CRS as the Landsat data

cartopy to the rescue!

In [41]:
import cartopy.crs as ccrs
In [42]:
# Initialize a CRS object for our EPSG=32611 CRS
crs = ccrs.epsg("32611")
In [43]:
# Convert from (x0,y0) in 4326 to (x_center, y_center) in 32611
x_center, y_center = crs.transform_point(x0, y0, ccrs.PlateCarree())

Remember: we are converting from EPSG=4326 (PlateCarree) to EPSG=32611

Let's add a 30 km bounding box around the midpoint

In [44]:
buffer = 15e3 # in km

xmin = x_center - buffer
xmax = x_center + buffer
ymin = y_center - buffer
ymax = y_center + buffer

bounding_box = [[[xmin, ymin], [xmin, ymax], [xmax, ymax], [xmax, ymin]]]

Did this work?

Use the builting geoviews imagery to confirm..

In [45]:
EsriImagery * gv.Polygons(bounding_box, crs=crs).options(alpha=0.4)
Out[45]:

Now, let's set up the grid

In [46]:
res = 200 # 200 meter resolution
x = np.arange(xmin, xmax, res)
y = np.arange(ymin, ymax, res)

Do the re-gridding

This does a linear interpolation of the data using the nearest pixels.

In [47]:
NDVI_2017_regridded = NDVI_2017.interp(x=x, y=y)
NDVI_1988_regridded = NDVI_1988.interp(x=x, y=y)

Plot the re-gridded data side-by-side

In [48]:
img1988 = NDVI_1988_regridded.hvplot.image(
    x="x",
    y="y",
    crs=32611,
    geo=True,
    frame_height=300,
    frame_width=300,
    clim=(-3, 1),
    cmap="fire",
    title="1988"
)


img2017 = NDVI_2017_regridded.hvplot.image(
    x="x",
    y="y",
    crs=32611,
    geo=True,
    frame_height=300,
    frame_width=300,
    clim=(-3, 1),
    cmap="fire",
    title="2017"
)

img1988 + img2017
Out[48]:

Now that the images are lined up, the change in lake volume is clearly apparent

In [49]:
diff_regridded = NDVI_2017_regridded - NDVI_1988_regridded
diff_regridded
Out[49]:
<xarray.DataArray (y: 150, x: 150)>
array([[ 0.07285845,  0.02394533,  0.02255797, ...,  0.022373  ,
         0.02654387,  0.01870564],
       [ 0.09309169,  0.11359612,  0.16240596, ...,  0.0184576 ,
         0.01718405,  0.01623927],
       [ 0.1039483 ,  0.15174163,  0.14382052, ...,  0.01866696,
         0.01353435,  0.0147996 ],
       ...,
       [-0.10564601,  0.01861015,  0.0668057 , ...,  0.01163675,
         0.01841919,  0.00968276],
       [ 0.01748623, -0.06021414,  0.03493955, ...,  0.03757851,
         0.02141323,  0.03405552],
       [ 0.0131126 ,  0.08405171, -0.02384596, ...,  0.01839516,
         0.00968287,  0.02835872]])
Coordinates:
  * x        (x) float64 3.365e+05 3.367e+05 3.369e+05 ... 3.661e+05 3.663e+05
  * y        (y) float64 4.269e+06 4.269e+06 4.27e+06 ... 4.299e+06 4.299e+06
In [50]:
diff_regridded.hvplot.image(
    x="x",
    y="y",
    crs=32611,
    geo=True,
    frame_height=400,
    frame_width=400,
    cmap="coolwarm",
    clim=(-1, 1),
)
Out[50]:

The red areas (more vegetation in 2017) show clear loss of lake volume

One more thing: downsampling hi-res data

Given hi-resolution data, we can downsample to a lower resolution with the familiar groupby / mean framework from pandas

Let's try a resolution of 1000 meters instead of 200 meters

In [51]:
# Define a low-resolution grid
res_1000 = 1000
x_1000 = np.arange(xmin, xmax, res_1000)
y_1000 = np.arange(ymin, ymax, res_1000)
In [52]:
# Groupby new bins and take the mean of all pixels falling into a group
# First: groupby low-resolution x bins and take mean
# Then: groupby low-resolution y bins and take mean
diff_res_1000 = (
    diff_regridded.groupby_bins("x", x_1000, labels=x_1000[:-1])
    .mean(dim="x")
    .groupby_bins("y", y_1000, labels=y_1000[:-1])
    .mean(dim="y")
    .rename(x_bins="x", y_bins="y")
)
diff_res_1000
Out[52]:
<xarray.DataArray (y: 29, x: 29)>
array([[ 0.06869101,  0.13177651,  0.15469471,  0.07722272,  0.08874126,
         0.07121849,  0.03444372,  0.037119  ,  0.00104902,  0.04373775,
         0.06956536,  0.11455193,  0.1385741 ,  0.11434505,  0.06491517,
         0.06781426,  0.05570124,  0.05596228,  0.02769747,  0.02554617,
         0.03236267,  0.01968282,  0.02387923,  0.01797904,  0.01902946,
         0.02166195,  0.02174784,  0.01880276,  0.01359409],
       [ 0.08346992,  0.08776704,  0.13548499,  0.08220051,  0.09708871,
         0.06775058,  0.03779001,  0.01991329,  0.07547025,  0.13493956,
         0.20133291,  0.12017601,  0.07325837,  0.05940746,  0.07004369,
         0.0447765 ,  0.04581451,  0.03409407,  0.02344497,  0.03093005,
         0.02869954,  0.03271979,  0.02381046,  0.01556579,  0.02169905,
         0.0227847 ,  0.02022507,  0.018137  ,  0.01564086],
       [ 0.0936053 ,  0.09289295,  0.12060025,  0.12873584,  0.16235808,
         0.10101415,  0.04018545,  0.01535085,  0.07127819,  0.16983906,
         0.10893062,  0.21828906,  0.08079256,  0.07577124,  0.06833529,
         0.01555701, -0.01109825, -0.00299437,  0.01431601,  0.02801818,
         0.0293123 ,  0.02606464,  0.02491255,  0.01810481,  0.01536261,
         0.02188127,  0.02200607,  0.01673377,  0.00776444],
       [ 0.08717537,  0.07700656,  0.13392983,  0.14826298,  0.15392851,
         0.09926764,  0.0569021 ,  0.0610834 ,  0.08476449,  0.08527878,
...
         0.02798621,  0.02992438,  0.03641541,  0.02005467,  0.02204858,
         0.0287752 ,  0.02188122,  0.02495601,  0.02506901],
       [ 0.05668256,  0.07664267,  0.08830451,  0.07859942,  0.12379519,
         0.114724  ,  0.0405303 ,  0.03885428,  0.06682004,  0.05558355,
         0.03833086,  0.02902788,  0.17875198,  0.20029484,  0.01872939,
         0.02381692,  0.02002493,  0.0218631 ,  0.02401623,  0.02320183,
         0.03253913,  0.01651287,  0.02256299,  0.03535588,  0.03159138,
         0.02213577,  0.02914387,  0.029847  ,  0.02951869],
       [ 0.0616654 ,  0.04362798,  0.06757889,  0.13201092,  0.10358936,
         0.08289852,  0.10031057,  0.07410737,  0.06310736,  0.0420152 ,
         0.04179894,  0.02750726, -0.05992471,  0.02938139,  0.03474508,
         0.02488141,  0.02266253,  0.02158754,  0.0207588 ,  0.02306504,
         0.02320071,  0.02288257,  0.02793184,  0.02504522,  0.02886683,
         0.022406  ,  0.01791923,  0.0284036 ,  0.02830287],
       [ 0.04925822,  0.07870229,  0.1287353 ,  0.13487135,  0.07996429,
         0.09300206,  0.07809081,  0.07023948,  0.04476815,  0.03548731,
         0.03374816, -0.04190856, -0.00502124,  0.03805152,  0.03067927,
         0.02737551,  0.02609662,  0.02289044,  0.02446029,  0.02962656,
         0.0284653 ,  0.02105816,  0.03050465,  0.02811649,  0.02367856,
         0.03380146,  0.02753168,  0.02918184,  0.02360341]])
Coordinates:
  * y        (y) float64 4.269e+06 4.27e+06 4.271e+06 ... 4.296e+06 4.297e+06
  * x        (x) float64 3.365e+05 3.375e+05 3.385e+05 ... 3.635e+05 3.645e+05

Now let's plot the low-resolution version of the difference

In [53]:
diff_res_1000.hvplot.image(
    x="x",
    y="y",
    crs=32611,
    geo=True,
    frame_width=500,
    frame_height=400,
    cmap="coolwarm",
    clim=(-1, 1),
)
Out[53]:

On to example 2: urban heat islands

First load some metadata for Landsat 8

In [54]:
band_info = pd.DataFrame([
        (1,  "Aerosol", " 0.43 - 0.45",    0.440,  "30",         "Coastal aerosol"),
        (2,  "Blue",    " 0.45 - 0.51",    0.480,  "30",         "Blue"),
        (3,  "Green",   " 0.53 - 0.59",    0.560,  "30",         "Green"),
        (4,  "Red",     " 0.64 - 0.67",    0.655,  "30",         "Red"),
        (5,  "NIR",     " 0.85 - 0.88",    0.865,  "30",         "Near Infrared (NIR)"),
        (6,  "SWIR1",   " 1.57 - 1.65",    1.610,  "30",         "Shortwave Infrared (SWIR) 1"),
        (7,  "SWIR2",   " 2.11 - 2.29",    2.200,  "30",         "Shortwave Infrared (SWIR) 2"),
        (8,  "Panc",    " 0.50 - 0.68",    0.590,  "15",         "Panchromatic"),
        (9,  "Cirrus",  " 1.36 - 1.38",    1.370,  "30",         "Cirrus"),
        (10, "TIRS1",   "10.60 - 11.19",   10.895, "100 * (30)", "Thermal Infrared (TIRS) 1"),
        (11, "TIRS2",   "11.50 - 12.51",   12.005, "100 * (30)", "Thermal Infrared (TIRS) 2")],
    columns=['Band', 'Name', 'Wavelength Range (µm)', 'Nominal Wavelength (µm)', 'Resolution (m)', 'Description']).set_index(["Band"])
band_info
Out[54]:
Name Wavelength Range (µm) Nominal Wavelength (µm) Resolution (m) Description
Band
1 Aerosol 0.43 - 0.45 0.440 30 Coastal aerosol
2 Blue 0.45 - 0.51 0.480 30 Blue
3 Green 0.53 - 0.59 0.560 30 Green
4 Red 0.64 - 0.67 0.655 30 Red
5 NIR 0.85 - 0.88 0.865 30 Near Infrared (NIR)
6 SWIR1 1.57 - 1.65 1.610 30 Shortwave Infrared (SWIR) 1
7 SWIR2 2.11 - 2.29 2.200 30 Shortwave Infrared (SWIR) 2
8 Panc 0.50 - 0.68 0.590 15 Panchromatic
9 Cirrus 1.36 - 1.38 1.370 30 Cirrus
10 TIRS1 10.60 - 11.19 10.895 100 * (30) Thermal Infrared (TIRS) 1
11 TIRS2 11.50 - 12.51 12.005 100 * (30) Thermal Infrared (TIRS) 2

Landsat data on Google Cloud Storage

We'll be downloading publicly available Landsat data from Google Cloud Storage

See: https://cloud.google.com/storage/docs/public-datasets/landsat

The relevant information is stored in our intake catalog:

From our catalog.yml file:

google_landsat_band:
    description: Landsat bands from Google Cloud Storage
    driver: rasterio
    parameters:
      path:
        description: landsat path
        type: int
      row:
        description: landsat row
        type: int
      product_id:
        description: landsat file id
        type: str
      band:
        description: band
        type: int
    args:
      urlpath: https://storage.googleapis.com/gcp-public-data-landsat/LC08/01/{{ '%03d' % path }}/{{ '%03d' % row }}/{{ product_id }}/{{ product_id }}_B{{ band }}.TIF
      chunks:
        band: 1
        x: 256
        y: 256

From the "urlpath" above, you can see we need "path", "row", and "product_id" variables to properly identify a Landsat image:

The path and row corresponding to the area over Philadelphia has already been selected using the Earth Explorer. This tool was also used to find the id of the particular date of interest using the same tool.

In [55]:
# Necessary variables
path = 14
row = 32
product_id = 'LC08_L1TP_014032_20160727_20170222_01_T1'

An alternative: Google Earth Engine (GEE)

  • A public data archive of satellite imagery going back more than 40 years.
  • A Python API exists for pulling data — good resource if you'd like to work with a large amount of raster data

We won't cover the specifics in the course, but geemap is a fantastic library for interacting with GEE.

Google Cloud Storage: Use a utility function to query the API and request a specific band

This will return a specific Landsat band as a dask array.

In [56]:
from random import random
from time import sleep


def get_band_with_exponential_backoff(path, row, product_id, band, maximum_backoff=32):
    """
    Google Cloud Storage recommends using exponential backoff 
    when accessing the API. 
    
    https://cloud.google.com/storage/docs/exponential-backoff
    """
    n = backoff = 0
    while backoff < maximum_backoff:
        try:
            return cat.google_landsat_band(
                path=path, row=row, product_id=product_id, band=band
            ).to_dask()
        except:
            backoff = min(2 ** n + random(), maximum_backoff)
            sleep(backoff)
            n += 1

Load all of the bands and combine them into a single xarray DataArray

Loop over each band, load that band using the above function, and then concatenate the results together..

In [57]:
bands = [1, 2, 3, 4, 5, 6, 7, 9, 10, 11]

datasets = []
for band in bands:
    da = get_band_with_exponential_backoff(
        path=path, row=row, product_id=product_id, band=band
    )
    da = da.assign_coords(band=[band])
    datasets.append(da)

ds = xr.concat(datasets, "band", compat="identical")

ds
Out[57]:
<xarray.DataArray (band: 10, y: 7871, x: 7741)>
dask.array<concatenate, shape=(10, 7871, 7741), dtype=uint16, chunksize=(1, 256, 256), chunktype=numpy.ndarray>
Coordinates:
  * band     (band) int64 1 2 3 4 5 6 7 9 10 11
  * y        (y) float64 4.582e+06 4.582e+06 4.582e+06 ... 4.346e+06 4.346e+06
  * x        (x) float64 3.954e+05 3.954e+05 3.955e+05 ... 6.276e+05 6.276e+05
Attributes:
    transform:      (30.0, 0.0, 395385.0, 0.0, -30.0, 4582215.0)
    crs:            +init=epsg:32618
    res:            (30.0, 30.0)
    is_tiled:       1
    nodatavals:     (nan,)
    scales:         (1.0,)
    offsets:        (0.0,)
    AREA_OR_POINT:  Point

Important: CRS info

CRS is EPSG=32618

Also grab the metadata from Google Cloud Storage

  • There is an associated metadata file stored on Google Cloud Storage
  • The below function will parse that metadata file for a specific path, row, and product ID
  • The specifics of this are not overly important for our purposes
In [58]:
def load_google_landsat_metadata(path, row, product_id):
    """
    Load Landsat metadata for path, row, product_id from Google Cloud Storage
    """

    def parse_type(x):
        try:
            return eval(x)
        except:
            return x

    baseurl = "https://storage.googleapis.com/gcp-public-data-landsat/LC08/01"
    suffix = f"{path:03d}/{row:03d}/{product_id}/{product_id}_MTL.txt"

    df = intake.open_csv(
        urlpath=f"{baseurl}/{suffix}",
        csv_kwargs={
            "sep": "=",
            "header": None,
            "names": ["index", "value"],
            "skiprows": 2,
            "converters": {"index": (lambda x: x.strip()), "value": parse_type},
        },
    ).read()
    metadata = df.set_index("index")["value"]
    return metadata
In [59]:
metadata = load_google_landsat_metadata(path, row, product_id)
metadata.head()
Out[59]:
index
ORIGIN                Image courtesy of the U.S. Geological Survey
REQUEST_ID                                     0501702206266_00020
LANDSAT_SCENE_ID                             LC80140322016209LGN01
LANDSAT_PRODUCT_ID        LC08_L1TP_014032_20160727_20170222_01_T1
COLLECTION_NUMBER                                               01
Name: value, dtype: object

Exercise: Let's trim our data to the Philadelphia limits

  • The Landsat image covers an area much wider than the Philadelphia limits
  • We'll load the city limits from Open Data Philly
  • Use the slice() function discussed last example!

1. Load the city limits

In [60]:
# Load the GeoJSON from the URL
url = "http://data.phl.opendata.arcgis.com/datasets/405ec3da942d4e20869d4e1449a2be48_0.geojson"
city_limits = gpd.read_file(url)

# Convert to the right CRS for this data
city_limits = city_limits.to_crs(epsg=32618)

2. Use the xmin/xmax and ymin/ymax of the city limits to trim the Landsat DataArray

  • Use the built-in slice functionality of xarray
  • Remember, the y coordinates are in descending order, so you'll slice will need to be in descending order as well
In [61]:
# Get x/y range of city limits from "total_bounds"
xmin, ymin, xmax, ymax = city_limits.total_bounds
In [62]:
# Slice our xarray data
subset = ds.sel(
                x=slice(xmin, xmax), 
                y=slice(ymax, ymin)
               ) # NOTE: y coordinate system is in descending order!

Call the compute() function on your sliced data

  • Originally, the Landsat data was stored as dask arrays
  • This should now only load the data into memory that covers Philadelphia

This should take about a minute or so, depending on the speed of your laptop.

In [63]:
subset = subset.compute()

subset
Out[63]:
<xarray.DataArray (band: 10, y: 1000, x: 924)>
array([[[10702, 10162, 10361, ..., 11681, 11489, 15594],
        [10870, 10376, 10122, ..., 11620, 11477, 12042],
        [10429, 10147, 10063, ..., 11455, 11876, 11790],
        ...,
        [11944, 11696, 11626, ..., 11192, 11404, 10923],
        [12406, 11555, 11155, ..., 11516, 11959, 10766],
        [11701, 11232, 10657, ..., 10515, 11475, 10926]],

       [[ 9809,  9147,  9390, ..., 10848, 10702, 15408],
        [ 9989,  9405,  9139, ..., 10831, 10660, 11361],
        [ 9439,  9083,  8981, ..., 10654, 11141, 11073],
        ...,
        [11220, 10853, 10741, ..., 10318, 10511,  9950],
        [11765, 10743, 10259, ..., 10646, 11378,  9829],
        [10889, 10365,  9630, ...,  9500, 10573, 10008]],

       [[ 9042,  8466,  8889, ..., 10014,  9647, 14981],
        [ 9699,  8714,  8596, ...,  9866,  9783, 11186],
        [ 8623,  8457,  8334, ...,  9688, 10474,  9993],
        ...,
...
        ...,
        [ 5027,  5028,  5038, ...,  5035,  5037,  5029],
        [ 5058,  5021,  5023, ...,  5035,  5041,  5031],
        [ 5036,  5041,  5040, ...,  5036,  5044,  5035]],

       [[29033, 28976, 28913, ..., 32614, 32577, 32501],
        [28940, 28904, 28858, ..., 32516, 32545, 32545],
        [28882, 28879, 28854, ..., 32431, 32527, 32563],
        ...,
        [30094, 29929, 29713, ..., 29521, 29525, 29429],
        [30140, 29972, 29696, ..., 29556, 29516, 29398],
        [30133, 29960, 29614, ..., 29587, 29533, 29424]],

       [[25558, 25519, 25492, ..., 27680, 27650, 27619],
        [25503, 25450, 25402, ..., 27636, 27630, 27639],
        [25473, 25434, 25378, ..., 27609, 27668, 27667],
        ...,
        [26126, 26055, 25934, ..., 25622, 25586, 25520],
        [26149, 26077, 25935, ..., 25651, 25594, 25507],
        [26147, 26050, 25856, ..., 25696, 25644, 25571]]], dtype=uint16)
Coordinates:
  * band     (band) int64 1 2 3 4 5 6 7 9 10 11
  * y        (y) float64 4.443e+06 4.443e+06 4.443e+06 ... 4.413e+06 4.413e+06
  * x        (x) float64 4.761e+05 4.761e+05 4.761e+05 ... 5.037e+05 5.038e+05
Attributes:
    transform:      (30.0, 0.0, 395385.0, 0.0, -30.0, 4582215.0)
    crs:            +init=epsg:32618
    res:            (30.0, 30.0)
    is_tiled:       1
    nodatavals:     (nan,)
    scales:         (1.0,)
    offsets:        (0.0,)
    AREA_OR_POINT:  Point
In [64]:
# Original data was about 8000 x 8000 in size
ds.shape
Out[64]:
(10, 7871, 7741)
In [65]:
# Sliced data is only about 1000 x 1000 in size!
subset.shape
Out[65]:
(10, 1000, 924)

Let's plot band 3 of the Landsat data

In [66]:
# City limits plot
limits_plot = city_limits.hvplot.polygons(
    line_color="white", alpha=0, line_alpha=1, crs=32618, geo=True
)

# Landsat plot
landsat_plot = subset.sel(band=3).hvplot.image(
    x="x",
    y="y",
    rasterize=True,
    cmap="viridis",
    frame_width=500,
    frame_height=500,
    geo=True,
    crs=32618,
)

# Combined
landsat_plot * limits_plot
OMP: Info #271: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
Out[66]:

Calculate the NDVI

Remember, NDVI = (NIR - Red) / (NIR + Red)

In [67]:
band_info.head()
Out[67]:
Name Wavelength Range (µm) Nominal Wavelength (µm) Resolution (m) Description
Band
1 Aerosol 0.43 - 0.45 0.440 30 Coastal aerosol
2 Blue 0.45 - 0.51 0.480 30 Blue
3 Green 0.53 - 0.59 0.560 30 Green
4 Red 0.64 - 0.67 0.655 30 Red
5 NIR 0.85 - 0.88 0.865 30 Near Infrared (NIR)

NIR is band 5 and Red is band 4

In [68]:
# Selet the bands we need
NIR = subset.sel(band=5)
RED = subset.sel(band=4)

# Calculate the NDVI
NDVI = (NIR - RED) / (NIR + RED)
NDVI = NDVI.where(NDVI < np.inf)

NDVI.head()
Out[68]:
<xarray.DataArray (y: 5, x: 5)>
array([[0.34690712, 0.43934922, 0.45820226, 0.45798014, 0.38205128],
       [0.33747045, 0.40266976, 0.47276229, 0.44722264, 0.4530777 ],
       [0.40783302, 0.44093268, 0.47419128, 0.49448025, 0.49341935],
       [0.40190311, 0.39986498, 0.46889397, 0.48497283, 0.48686142],
       [0.47248631, 0.47255625, 0.46365524, 0.37692424, 0.36055777]])
Coordinates:
  * y        (y) float64 4.443e+06 4.443e+06 4.443e+06 4.443e+06 4.443e+06
  * x        (x) float64 4.761e+05 4.761e+05 4.761e+05 4.762e+05 4.762e+05
In [69]:
# NEW: Use datashader to plot the NDVI
p = NDVI.hvplot.image(
    x="x",
    y="y",
    geo=True,
    crs=32618,
    datashade=True, # NEW: USE DATASHADER
    project=True, # NEW: PROJECT THE DATA BEFORE PLOTTING
    frame_height=500,
    frame_width=500,
    cmap="viridis",
)

# NEW: Use a transparent rasterized version of the plot for hover text
# No hover text available with datashaded images so we can use the rasterized version
p_hover = NDVI.hvplot(
    x="x",
    y="y",
    crs=32618,
    geo=True,
    rasterize=True,
    frame_height=500,
    frame_width=500,
    alpha=0, # SET ALPHA=0 TO MAKE THIS TRANSPARENT
    colorbar=False,
)

# City limits
limits_plot = city_limits.hvplot.polygons(
    geo=True, crs=32618, line_color="white", line_width=2, alpha=0, line_alpha=1
)

p * p_hover * limits_plot
Out[69]:

A couple of notes

  • Notice that we leveraged datashader algorithm to plot the NDVI by specifying the datashade=True keyword
  • Hover tools won't work with datashaded images — instead, we overlaid a transparent rasterized version of the image, which shows the mean pixel values across the image