from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import geopandas as gpd
np.random.seed(42)

plt.rcParams['figure.figsize'] = (10,6)

Week 10: Clustering Analysis in Python¶

Nov 8, 2021

Housekeeping¶

Assignment #5 (optional) due on Wednesday (11/10)
Assignment #6 (optional) will be assigned on Wednesday, due in 2 weeks (11/24)
Remember, you have to do one of homeworks #4, #5, or #6

Recap¶

Last week: urban street networks + interactive web maps
New tools: OSMnx, Pandana, and Folium

Where we left off last week: choropleth maps in Folium¶

Example: A map of households without internet for US counties

Two ways:

The easy way: folium.Choropleth
The hard way: folium.GeoJson

import folium

Load the data, remove counties with no households and add our new column:

# Load CSV data from the data/ folder
census_data = pd.read_csv("./data/internet_avail_census.csv", dtype={"geoid": str})

# Remove counties with no households
valid = census_data['universe'] > 0
census_data = census_data.loc[valid]

# Calculate the percent without internet
census_data['percent_no_internet'] = census_data['no_internet'] / census_data['universe']

census_data.head()

Load the counties geometry data too:

# Load counties GeoJSOn from the data/ folder
counties = gpd.read_file("./data/us-counties-10m.geojson")

counties.head()

The hard way: use folium.GeoJson¶

The good:
- More customizable, and can add user interaction
The bad:
- Requires more work
- No way to add a legend, see this open issue on GitHub

The steps involved¶

Join data and geometry features into a single GeoDataFrame
Define a function to style features based on data values
Create GeoJSON layer and add it to the map

Step 1: Join the census data and features¶

Note: this is different than using folium.Choropleth, where data and features are stored in two separate data frames.

# Merge the county geometries with census data
# Left column: "id"
# Right column: "geoid"
census_joined = counties.merge(census_data, left_on="id", right_on="geoid")

census_joined.head()

Step 2: Normalize the data column to 0 to 1¶

We will use a matplotlib color map that requires data to be between 0 and 1
Normalize our "percent_no_internet" column to be between 0 and 1

# Minimum
min_val = census_joined['percent_no_internet'].min()

# Maximum
max_val = census_joined['percent_no_internet'].max()

# Calculate a normalized column
normalized = (census_joined['percent_no_internet'] - min_val) / (max_val - min_val)

# Add to the dataframe
census_joined['percent_no_internet_normalized'] = normalized

Step 3: Define our style functions¶

Create a matplotlib colormap object using plt.get_cmap()
Color map objects are functions: give the function a number between 0 and 1 and it will return a corresponding color from the color map
Based on the feature data, evaluate the color map and convert to a hex string

import matplotlib.colors as mcolors

# Use a red-purple colorbrewer color scheme
cmap = plt.get_cmap('RdPu')

# The minimum value of the color map as an RGB tuple
cmap(0)

(1.0, 0.9686274509803922, 0.9529411764705882, 1.0)

# The minimum value of the color map as a hex string
mcolors.rgb2hex(cmap(0.0))

'#fff7f3'

# The maximum value of the color map as a hex string
mcolors.rgb2hex(cmap(1.0))

'#49006a'

def get_style(feature):
    """
    Given an input GeoJSON feature, return a style dict.
    
    Notes
    -----
    The color in the style dict is determined by the 
    "percent_no_internet_normalized" column in the 
    input "feature".
    """
    # Get the data value from the feature
    value = feature['properties']['percent_no_internet_normalized']
    
    # Evaluate the color map
    # NOTE: value must between 0 and 1
    rgb_color = cmap(value) # this is an RGB tuple
    
    # Convert to hex string
    color = mcolors.rgb2hex(rgb_color)
    
    # Return the style dictionary
    return {'weight': 0.5, 'color': color, 'fillColor': color, "fillOpacity": 0.75}

def get_highlighted_style(feature):
    """
    Return a style dict to use when the user highlights a 
    feature with the mouse.
    """
    
    return {"weight": 3, "color": "black"}

Step 4: Convert our data to GeoJSON¶

Tip: To limit the amount of data Folium has to process, it's best to trim our GeoDataFrame to only the columns we'll need before converting to GeoJSON
You can use the .to_json() function to convert to a GeoJSON string

needed_cols = ['NAME', 'percent_no_internet', 'percent_no_internet_normalized', 'geometry']
census_json = census_joined[needed_cols].to_json()

# STEP 1: Initialize the map
m = folium.Map(location=[40, -98], zoom_start=4)

# STEP 2: Add the GeoJson to the map
folium.GeoJson(
    census_json, # The geometry + data columns in GeoJSON format
    style_function=get_style, # The style function to color counties differently
    highlight_function=get_highlighted_style, 
    tooltip=folium.GeoJsonTooltip(['NAME', 'percent_no_internet'])
).add_to(m)



# avoid a rendering bug by saving as HTML and re-loading
m.save('percent_no_internet.html')

And viola!¶

The hard way is harder, but we have a tooltip and highlight interactivity!

from IPython.display import IFrame

IFrame('percent_no_internet.html', width=800, height=500)

At-home exercise: Can we repeat this with altair?¶

Try to replicate the above interactive map exactly (minus the background tiles). This includes:

Using the red-purple colorbrewer scheme
Having a tooltip with the percentage and county name

Note: Altair's syntax is similar to the folium.Choropleth syntax — you should pass the counties GeoDataFrame to the alt.Chart() and then use the transform_lookup() to merge those geometries to the census data and pull in the census data column we need ("percent_without_internet").

Hints

The altair example gallery includes a good choropleth example: https://altair-viz.github.io/gallery/choropleth.html
See altair documentation on changing the color scheme and the Vega documentation for the names of the allowed color schemes in altair
You'll want to specify the projection type as "albersUsa"

import altair as alt

# Initialize the chart with the counties data
alt.Chart(counties).mark_geoshape(stroke="white", strokeWidth=0.25).encode(
    # Encode the color 
    color=alt.Color(
        "percent_no_internet:Q",
        title="Percent Without Internet",
        scale=alt.Scale(scheme="redpurple"),
        legend=alt.Legend(format=".0%")
    ),
    # Tooltip
    tooltip=[
        alt.Tooltip("NAME:N", title="Name"),
        alt.Tooltip("percent_no_internet:Q", title="Percent Without Internet", format=".1%"),
    ],
).transform_lookup(
    lookup="id", # The column name in the counties data to match on
    from_=alt.LookupData(census_data, "geoid", ["percent_no_internet", "NAME"]), # Match census data on "geoid"
).project(
    type="albersUsa"
).properties(
    width=700, height=500
)

Leaflet/Folium plugins¶

One of leaflet's strengths: a rich set of open-source plugins

https://leafletjs.com/plugins.html

Many of these are available in Folium!

Example: Heatmap¶

from folium.plugins import HeatMap

HeatMap?

Example: A heatmap of new construction permits in Philadelphia in the last 30 days¶

New construction in Philadelphia requires a building permit, which we can pull from Open Data Philly.

Data available from OpenDataPhilly: https://www.opendataphilly.org/dataset/licenses-and-inspections-building-permits
Query the database API directly to get the GeoJSON
We can use the carto2gpd package to get the data
API documentation: https://cityofphiladelphia.github.io/carto-api-explorer/#permits

Step 1: Download the data from CARTO¶

The "current_date" variable in SQL databases

You can use the pre-defined "current_date" variable to get the current date. For example, to get the permits from the past 30 days, we could do:

SELECT * FROM permits WHERE permitissuedate >= current_date - 30

Selecting only new construction permits

To select new construction permits, you can use the "permitdescription" field. There are two relevant categories:

"RESIDENTIAL BUILDING PERMIT"
"COMMERCIAL BUILDING PERMIT"

We can use the SQL IN command (documentation) to easily select rows that have these categories.

import carto2gpd

# API URL
url = "https://phl.carto.com/api/v2/sql"

# Table name on CARTO
table_name = "permits"

# The where clause, with two parts
DAYS = 30
where = f"permitissuedate >= current_date - {DAYS}"
where += " and permitdescription IN ('RESIDENTIAL BUILDING PERMIT', 'COMMERCIAL BUILDING PERMIT')"

where

"permitissuedate >= current_date - 30 and permitdescription IN ('RESIDENTIAL BUILDING PERMIT', 'COMMERCIAL BUILDING PERMIT')"

# Run the query
permits = carto2gpd.get(url, table_name, where=where)

len(permits)

565

permits.head()

Step 2: Remove missing geometries¶

Some permits don't have locations — use the .geometry.notnull() function to trim the data frame to those incidents with valid geometries.

permits = permits.loc[permits.geometry.notnull()].copy()

Step 3: Extract out the lat/lng coordinates¶

Note: We need an array of (latitude, longitude) pairs. Be careful about the order!

# Extract the lat and longitude from the geometery column
permits['lat'] = permits.geometry.y
permits['lng'] = permits.geometry.x

permits.head()

# Split out the residential and commercial
residential = permits.query("permitdescription == 'RESIDENTIAL BUILDING PERMIT'")
commercial = permits.query("permitdescription == 'COMMERCIAL BUILDING PERMIT'")

# Make a NumPy array (use the "values" attribute)
residential_coords = residential[['lat', 'lng']].values
commercial_coords = commercial[['lat', 'lng']].values

commercial_coords[:5]

array([[ 39.936471, -75.177401],
       [ 39.952847, -75.184629],
       [ 39.949254, -75.170423],
       [ 40.058061, -75.078664],
       [ 39.971096, -75.21212 ]])

Step 4: Make a Folium map and add a HeatMap¶

The HeatMap takes the list of coordinates: the first column is latitude and the second column longitude

Commercial building permits¶

# Initialize map
m = folium.Map(
    location=[39.99, -75.13],
    tiles='Cartodb Positron',
    zoom_start=12
)


# Add heat map coordinates
HeatMap(commercial_coords, radius=15).add_to(m)

m

Residential building permits¶

HeatMap?

# Initialize map
m = folium.Map(
    location=[39.99, -75.13],
    tiles='Cartodb Positron',
    zoom_start=12
)


# Add heat map
HeatMap(residential_coords, radius=15).add_to(m)

m

Takeaways¶

Commercial construction concentrated in the greater Center City area while residential construction is primarily outside of Center City...

That's it for interactive maps w/ Folium...¶

Now on to clustering...

Clustering in Python¶

Both spatial and non-spatial datasets
Two new techniques:
- Non-spatial: K-means
- Spatial: DBSCAN
Two labs/exercises this week:
1. Grouping Philadelphia neighborhoods by AirBnb listings
2. Identifying clusters in taxi rides in NYC

"Machine learning"¶

The computer learns patterns and properties of an input data set without the user specifying them beforehand
Can be both supervised and unsupervised

Supervised¶

Example: classification
Given a training set of labeled data, learn to assign labels to new data

Unsupervised¶

Example: clustering
Identify structure / clusters in data without any prior knowledge

Machine learning in Python: scikit-learn¶

State-of-the-art machine learning in Python
Easy to use, lots of functionality

Clustering is just one (of many) features¶

https://scikit-learn.org/stable/

Note: We will focus on clustering algorithms today and discuss a few other machine learning techniques in the next two weeks. If there is a specific scikit-learn use case we won't cover, I'm open to ideas for incorporating it as part of the final project.

Part 1: Non-spatial clustering¶

The goal

Partition a dataset into groups that have a similar set of attributes, or features, within the group and a dissimilar set of features between groups.

Minimize the intra-cluster variance and maximize the inter-cluster variance of features.

Some intuition¶

K-Means clustering¶

Simple but robust clustering algorithm
Widely used
Important: user must specify the number of clusters
Cannot be used to find density-based clusters

This is just one of several clustering methods¶

https://scikit-learn.org/stable/modules/clustering.html#overview-of-clustering-methods

A good introduction¶

Andrew Ng's Coursera lecture

How does it work?¶

Minimizes the intra-cluster variance: minimizes the sum of the squared distances between all points in a cluster and the cluster centroid

K-means in action¶

Example: Clustering countries by health and income¶

Health expectancy in years vs. GDP per capita and population for 187 countries (as of 2015)
Data from Gapminder

import altair as alt
from vega_datasets import data as vega_data

Read the data from a URL:

gapminder = pd.read_csv(vega_data.gapminder_health_income.url)
gapminder.head()

Plot it with altair¶

alt.Chart(gapminder).mark_circle().encode(
    alt.X("income:Q", scale=alt.Scale(type="log")),
    alt.Y("health:Q", scale=alt.Scale(zero=False)),
    size='population:Q',
    tooltip=list(gapminder.columns),
).interactive()

K-Means with scikit-learn¶

from sklearn.cluster import KMeans

Let's start with 5 clusters

KMeans?

kmeans = KMeans(n_clusters=5)

Lot's of optional parameters, but n_clusters is the most important:

 kmeans

KMeans(n_clusters=5)

Let's fit just income first¶

Use the fit() function

kmeans.fit(gapminder[['income']])

KMeans(n_clusters=5)

Extract the cluster labels¶

Use the labels_ attribute

gapminder['label'] = kmeans.labels_

How big are our clusters?¶

gapminder.groupby('label').size()

label
0    106
1     24
2      6
3     50
4      1
dtype: int64

Plot it again, coloring by our labels¶

alt.Chart(gapminder).mark_circle().encode(
    alt.X('income:Q', scale=alt.Scale(type='log')),
    alt.Y('health:Q', scale=alt.Scale(zero=False)),
    size='population:Q',
    color=alt.Color('label:N', scale=alt.Scale(scheme='dark2')),
    tooltip=list(gapminder.columns)
).interactive()

Calculate average income by group¶

gapminder.groupby("label")['income'].mean().sort_values()

label
0      5279.830189
3     21040.820000
1     42835.500000
2     74966.166667
4    132877.000000
Name: income, dtype: float64

Data is nicely partitioned into income levels

How about health, income, and population?¶

# Fit all three columns
kmeans.fit(gapminder[['income', 'health', 'population']])

# Extract the labels
gapminder['label'] = kmeans.labels_

Plot the new labels¶

alt.Chart(gapminder).mark_circle().encode(
    alt.X('income:Q', scale=alt.Scale(type='log')),
    alt.Y('health:Q', scale=alt.Scale(zero=False)),
    size='population:Q',
    color=alt.Color('label:N', scale=alt.Scale(scheme='dark2')),
    tooltip=list(gapminder.columns)
).interactive()

It....didn't work that well¶

What's wrong?

K-means is distance-based, but our features have wildly different distance scales

scikit-learn to the rescue: pre-processing¶

Scikit-learn has a utility to normalize features with an average of zero and a variance of 1
Use the StandardScaler class

from sklearn.preprocessing import MinMaxScaler

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

Use the `fit_transform()` function to scale your features¶

gapminder_scaled = scaler.fit_transform(gapminder[['income', 'health', 'population']])

Important: The fit_transform() function converts the DataFrame to a numpy array:

# fit_transform() converts the data into a numpy array
gapminder_scaled[:5]

array([[-0.79481258, -1.8171424 , -0.04592039],
       [-0.34333373,  0.55986273, -0.25325837],
       [-0.1972197 ,  0.62456075,  0.00404216],
       [ 1.52369617,  1.6079706 , -0.27303503],
       [-0.49936524, -1.38107777, -0.09843447]])

# mean of zero
gapminder_scaled.mean(axis=0)

array([ 8.07434927e-17, -1.70511258e-15, -1.89984689e-17])

# variance of one
gapminder_scaled.std(axis=0)

array([1., 1., 1.])

Now fit the scaled features¶

# Perform the fit
kmeans.fit(gapminder_scaled)

# Extract the labels
gapminder['label'] = kmeans.labels_

alt.Chart(gapminder).mark_circle().encode(
    alt.X('income:Q', scale=alt.Scale(type='log')),
    alt.Y('health:Q', scale=alt.Scale(zero=False)),
    size='population:Q',
    color=alt.Color('label:N', scale=alt.Scale(scheme='dark2')),
    tooltip=list(gapminder.columns)
).interactive()

# Number of countries per cluster
gapminder.groupby("label").size()

label
0    86
1     2
2    32
3    61
4     6
dtype: int64

# Average population per cluster
gapminder.groupby("label")['population'].mean().sort_values() / 1e6

label
4       2.988746
3      21.441296
0      26.239937
2      32.501032
1    1343.549735
Name: population, dtype: float64

# Average life expectancy per cluster
gapminder.groupby("label")['health'].mean().sort_values()

label
3    62.232951
1    71.850000
0    74.313837
2    80.806250
4    81.033333
Name: health, dtype: float64

# Average income per cluster
gapminder.groupby("label")['income'].mean().sort_values() / 1e3

label
3     4.150623
1     9.618500
0    13.229907
2    40.322094
4    86.987500
Name: income, dtype: float64

gapminder.loc[gapminder['label']==4]

kmeans.inertia_

80.84493050696734

gapminder.loc[gapminder['label']==2]

Exercise: Clustering neighborhoods by Airbnb stats¶

I've extracted neighborhood Airbnb statistics for Philadelphia neighborhoods from Tom Slee's website.

The data includes average price per person, overall satisfaction, and number of listings.

Two good references for Airbnb data¶

Original research study: How Airbnb's Data Hid the Facts in New York City

	NAME	universe	no_internet	state	county	geoid	percent_no_internet
0	Washington County, Mississippi	18299.0	6166.0	28	151	28151	0.336958
1	Perry County, Mississippi	4563.0	1415.0	28	111	28111	0.310103
2	Choctaw County, Mississippi	3164.0	1167.0	28	19	28019	0.368837
3	Itawamba County, Mississippi	8706.0	1970.0	28	57	28057	0.226281
4	Carroll County, Mississippi	3658.0	1218.0	28	15	28015	0.332969

	id	geometry
0	53073	MULTIPOLYGON (((-120.85361 49.00011, -120.7674...
1	30105	POLYGON ((-106.11238 48.99904, -106.15187 48.8...
2	30029	POLYGON ((-114.06985 48.99904, -114.05908 48.8...
3	16021	POLYGON ((-116.04755 49.00065, -116.04755 48.5...
4	30071	POLYGON ((-107.17840 49.00011, -107.20712 48.9...

	id	geometry	NAME	universe	no_internet	state	county	geoid	percent_no_internet
0	53073	MULTIPOLYGON (((-120.85361 49.00011, -120.7674...	Whatcom County, Washington	85008.0	9189.0	53	73	53073	0.108096
1	30105	POLYGON ((-106.11238 48.99904, -106.15187 48.8...	Valley County, Montana	3436.0	672.0	30	105	30105	0.195576
2	30029	POLYGON ((-114.06985 48.99904, -114.05908 48.8...	Flathead County, Montana	38252.0	5662.0	30	29	30029	0.148018
3	16021	POLYGON ((-116.04755 49.00065, -116.04755 48.5...	Boundary County, Idaho	4605.0	1004.0	16	21	16021	0.218024
4	30071	POLYGON ((-107.17840 49.00011, -107.20712 48.9...	Phillips County, Montana	1770.0	484.0	30	71	30071	0.273446

	geometry	cartodb_id	objectid	permitnumber	addressobjectid	parcel_id_num	permittype	permitdescription	commercialorresidential	typeofwork	...	unit_type	unit_num	zip	censustract	council_district	opa_owner	systemofrecord	geocode_x	geocode_y	posse_jobid
0	POINT (-75.17740 39.93647)	4985	4861	CP-2021-006100	136746075	350978	BUILDING	COMMERCIAL BUILDING PERMIT	COMMERCIAL	ADDITION AND/OR ALTERATION	...	None	None	19146-4354	21	2	CLEMONS TIMOTHY	ECLIPSE	2.689812e+06	230221.856589	391995256
1	None	6036	5279	RP-2021-006695	313880290	Parcel A	RESIDENTIAL BUILDING	RESIDENTIAL BUILDING PERMIT	RESIDENTIAL	NEW CONSTRUCTION	...	#	PARCELA	19122	144	None	None	ECLIPSE	NaN	NaN	347147032
2	POINT (-75.16793 39.94323)	11470	10792	RP-2021-013327	130247323	452359	RESIDENTIAL BUILDING	RESIDENTIAL BUILDING PERMIT	RESIDENTIAL	ADDITION AND/OR ALTERATION	...	None	None	19146-1634	14	2	HEISLER TRACY A, HEISLER ERIC J	ECLIPSE	2.692394e+06	232759.345244	385569983
3	POINT (-75.13402 39.99340)	11493	10794	RP-2021-016523	134395653	182217	RESIDENTIAL BUILDING	RESIDENTIAL BUILDING PERMIT	RESIDENTIAL	ADDITION AND/OR ALTERATION	...	None	None	19133-3529	176.01	7	BURGOS MICHELLE	ECLIPSE	2.701358e+06	251308.340764	403583350
4	POINT (-75.18463 39.95285)	15459	15886	CP-2021-001669	132682359	152856	BUILDING	COMMERCIAL BUILDING PERMIT	COMMERCIAL	NEW CONSTRUCTION (SHELL ONLY)	...	None	None	19104-3413	369	3	HORIZON HOUSE INC	ECLIPSE	2.687613e+06	236125.949970	329652657

	geometry	cartodb_id	objectid	permitnumber	addressobjectid	parcel_id_num	permittype	permitdescription	commercialorresidential	typeofwork	...	zip	censustract	council_district	opa_owner	systemofrecord	geocode_x	geocode_y	posse_jobid	lat	lng
0	POINT (-75.17740 39.93647)	4985	4861	CP-2021-006100	136746075	350978	BUILDING	COMMERCIAL BUILDING PERMIT	COMMERCIAL	ADDITION AND/OR ALTERATION	...	19146-4354	21	2	CLEMONS TIMOTHY	ECLIPSE	2.689812e+06	230221.856589	391995256	39.936471	-75.177401
2	POINT (-75.16793 39.94323)	11470	10792	RP-2021-013327	130247323	452359	RESIDENTIAL BUILDING	RESIDENTIAL BUILDING PERMIT	RESIDENTIAL	ADDITION AND/OR ALTERATION	...	19146-1634	14	2	HEISLER TRACY A, HEISLER ERIC J	ECLIPSE	2.692394e+06	232759.345244	385569983	39.943226	-75.167934
3	POINT (-75.13402 39.99340)	11493	10794	RP-2021-016523	134395653	182217	RESIDENTIAL BUILDING	RESIDENTIAL BUILDING PERMIT	RESIDENTIAL	ADDITION AND/OR ALTERATION	...	19133-3529	176.01	7	BURGOS MICHELLE	ECLIPSE	2.701358e+06	251308.340764	403583350	39.993400	-75.134018
4	POINT (-75.18463 39.95285)	15459	15886	CP-2021-001669	132682359	152856	BUILDING	COMMERCIAL BUILDING PERMIT	COMMERCIAL	NEW CONSTRUCTION (SHELL ONLY)	...	19104-3413	369	3	HORIZON HOUSE INC	ECLIPSE	2.687613e+06	236125.949970	329652657	39.952847	-75.184629
5	POINT (-75.16541 39.92964)	16078	16365	RP-2021-011113	15443506	428359	RESIDENTIAL BUILDING	RESIDENTIAL BUILDING PERMIT	RESIDENTIAL	ADDITION AND/OR ALTERATION	...	19148-1007	29	1	BUCKMAN LAURIE G	ECLIPSE	2.693247e+06	227834.344041	373025230	39.929644	-75.165408

	country	income	health	population
0	Afghanistan	1925	57.63	32526562
1	Albania	10620	76.00	2896679
2	Algeria	13434	76.50	39666519
3	Andorra	46577	84.10	70473
4	Angola	7615	61.00	25021974

	country	income	health	population	label
24	Brunei	73003	78.7	423188	4
88	Kuwait	82633	80.7	3892115	4
97	Luxembourg	88314	81.1	567110	4
124	Norway	64304	81.6	5210967	4
134	Qatar	132877	82.0	2235355	4
145	Singapore	80794	82.1	5603740	4

	country	income	health	population	label
3	Andorra	46577	84.1	70473	2
8	Australia	44056	81.8	23968973	2
9	Austria	44401	81.0	8544586	2
12	Bahrain	44138	79.2	1377237	2
16	Belgium	41240	80.4	11299192	2
30	Canada	43294	81.7	35939927	2
44	Cyprus	29797	82.6	1165300	2
45	Czech Republic	29437	78.6	10543186	2
46	Denmark	43495	80.1	5669081	2
58	Finland	38923	80.8	5503457	2
59	France	37599	81.9	64395345	2
63	Germany	44053	81.1	80688545	2
65	Greece	25430	79.8	10954617	2
74	Iceland	42182	82.8	329425	2
79	Ireland	47758	80.4	4688465	2
80	Israel	31590	82.4	8064036	2
81	Italy	33297	82.1	59797685	2
83	Japan	36162	83.5	126573481	2
104	Malta	30265	82.1	418670	2
118	Netherlands	45784	80.6	16924929	2
119	New Zealand	34186	80.6	4528526	2
125	Oman	48226	75.7	4490541	2
133	Portugal	26437	79.8	10349803	2
140	Saudi Arabia	52469	78.1	31540372	2
147	Slovenia	28550	80.2	2067526	2
151	South Korea	34644	80.7	50293439	2
153	Spain	32979	81.7	46121699	2
160	Sweden	44892	82.0	9779426	2
161	Switzerland	56118	82.9	8298663	2
175	United Arab Emirates	60749	76.6	9156963	2
176	United Kingdom	38225	81.4	64715810	2
177	United States	53354	79.1	321773631	2