Week 1B: Exploratory Data Science in Python¶

9/8/2021

Today¶

Introduction to Pandas
Key data analysis concepts
Example: Zillow + Census housing data

Reminder: The weekly workflow¶

You'll set up your local Python environment as part of first homework assignment (posted today!)
Each week, you will have two options to follow along with lectures:
1. Using Binder in the cloud, launching via the button on the week's repository
2. Download the week's repository to your laptop and launch the notebook locally
Work on homeworks locally on your laptop — Binder is only a temporary environment (no save features)

To follow along today, go to https://github.com/MUSA-550-Fall-2021/week-1

Screen%20Shot%202020-09-02%20at%2010.31.40%20PM.png

Python Data Analysis: the pandas package¶

Documentation is available at https://pandas.pydata.org

The new, less fun logo¶

The following line imports the pandas package:

import pandas as pd

Basic pandas concepts¶

The primary objects in pandas are the:

DataFrame, which is very similar to an Excel spreadsheet, and has rows and named columns
Series, which represents a single column of data. A DataFrame contains one or more Series and a name for each Series.

The data frame is a commonly used abstraction for data manipulation. Similar implementations exist in R.

You can think Series objects as fancier versions of Python's built-in list data type

To create a Series object:

# city names and population
city_names = pd.Series(['San Francisco', 'San Jose', 'Sacramento'])
population = pd.Series([852469, 1015785, 485199])

city_names

0    San Francisco
1         San Jose
2       Sacramento
dtype: object

DataFrame objects can be created by passing a dict mapping string column names to their respective Series.

cities_df = pd.DataFrame({ 'City Name': city_names, 
                          'Population': population })
cities_df

Note: always try to give your variables meaningful names. It will help immensely when you are trying to debug problems or when you're trying to understand the code you've written months later.

Accessing data¶

# access columns with a dict-like syntax
cities_df['Population']

0     852469
1    1015785
2     485199
Name: Population, dtype: int64

Rows can be accessed using Python's syntax for slicing¶

# list slicing: get the elements with indices 1 and 2 (but NOT 3)
cities_list = ['San Francisco', 'San Jose', 'Sacramento']
cities_list[:2]

['San Francisco', 'San Jose']

Unfortunately the functionality for slicing lists is not that powerful...but pandas will have many more features!

# slicing data frame rows is very similar!
cities_df[:2]

pandas includes functionality for many different ways of selecting data. See the documentation for many more examples.

Manipulating data¶

cities_df['Population'].median()

852469.0

NumPy is a popular toolkit for scientific computing.

pandas Series can be used as arguments to most NumPy functions:

import numpy as np

# calculate the median population value
np.median(cities_df['Population'])

852469.0

For more complex single-column transformations, you can use Series.apply. It accepts a function that is applied to each value in the Series.

For example, we can find which cities have a population greater than a million:

# define our function
def get_large_cities(population):
    return population > 1e6

large_cities_sel = cities_df['Population'].apply(get_large_cities)

large_cities_sel

0    False
1     True
2    False
Name: Population, dtype: bool

# add the new computed column to our original data frame
cities_df["Large Cities"] = large_cities_sel

small_cities_func =  lambda population: population < 1e6

# we can also use lambda (unnamed, inline) functions
cities_df["Small Cities"] = cities_df["Population"].apply(
    small_cities_func
)

# print out
cities_df

Data selection¶

We can select the "large" cities by passing the boolean values to the .loc() function of a DataFrame:

cities_df['Large Cities']

0    False
1     True
2    False
Name: Large Cities, dtype: bool

cities_df.loc[cities_df['Large Cities']]

There is a faster, more concise, to achieve this!¶

cities_df['Population'] > 1e6

0    False
1     True
2    False
Name: Population, dtype: bool

# this is equivalent to doing 
cities_df.loc[ cities_df['Population'] < 1e6 ]

What about getting rows where "Large Cities" is False?¶

Use the Python tilde operator to do a logicial not operation:

# reverse the large cities boolean selection
cities_df.loc[~large_cities_sel]

# or equivalently:
# NOTE: you need to put the whole expression in () and then apply the tilde!
cities_df.loc[~ (cities_df['Population'] > 1e6) ]

An even faster way!¶

We can use the pandas query function.

The query() function will return a subset of your dataframe based on a string version of the boolean expression.

cities_df.query("Population < 1e6")

How to remember the specifics of all of these functions?¶

The documentation is your best friend! Use the question mark operator!

# use the question mark
pd.DataFrame.loc?

Now onto a more interesting example...¶

Analyzing trends in rentals rates over time¶

Let's load census data on rentals rates from the 2019 Amercian Community Survey.

We've stored the data locally in the comma-separated value (CSV) format. Later in the course, you'll learn how to download them directly using Python.

Use the read_csv() function from pandas. The first argument to read_csv() is the file path to load.

census_df = pd.read_csv("./data/census/ACS_19_1YR_B25003.csv")

census_df.head(n=5)

Select certain columns and rename them.

Total: Total occupied housing units
Rentals: Renter-occupied housing units

## FIRST STEP: let's trim to the columns we want

# columns we want
columns = ['GEO.display-label', 'HD01_VD01', 'HD01_VD03']
census_df = census_df[columns]

# Peak at the dataframe with trimmed columns
census_df.head()

## STEP 2: Let's rename the columns!

# rename columns
census_df.columns = ['City', 'Total', 'Rentals']

# Return the head (first 5 rows) from the cell
census_df.head(n=5)

# Step 3: Add the rental rate as a new column
census_df['Rental Rate'] = census_df['Rentals'] / census_df['Total'] * 100

census_df.head()

# How many rows are in the dataframe? --> use the len() operator

len(census_df)

634

We can select the largest cities by population using the Series.isin() function, which checks if each value in the Series is in the specified list.

# Define the list of city names we want
top_cities_list = [
    "Philadelphia city, Pennsylvania",
    "New York city, New York",
    "Los Angeles city, California",
    "Houston city, Texas",
    "Chicago city, Illinois",
    "Phoenix city, Arizona",
]

# Use the isin() to test whether a row value is in a city
census_df["City"].isin(top_cities_list)

0      False
1      False
2      False
3      False
4      False
       ...  
629    False
630    False
631    False
632    False
633    False
Name: City, Length: 634, dtype: bool

# Now do the selection!
census_df.loc[census_df["City"].isin(top_cities_list)]  # select the valid rows

Only 6 rows as expected!

Alternatively, we could use the query() function. The syntax is a bit different (and I always forget and have to look at the documentation).

In the query() string, you can reference existing variables by prefacing the variable name with an "@" symbol.

For example:

# Get the rows where the City column is in "top_cities_list"
census_df.query("City in @top_cities_list")

An aside about string formatting¶

Python uses the "%" operator to insert variable values into strings. For example, the file name of the data we want to load for 2019 is:

year_str = "19"
path = "./data/census/ACS_%s_1YR_B25003.csv" % year_str
print(path)

./data/census/ACS_19_1YR_B25003.csv

See this guide on this type of string formatting for more info.

Python f-strings¶

Python version 3.6 introduced a new and improved string formatting syntax, where the variable that is being inserted can be referenced directly using the variable name.

The variable name should be enclosed in curly braces inside the string:

year_str = "19"
path = f"./data/census/ACS_{year_str}_1YR_B25003.csv"
print(path)

./data/census/ACS_19_1YR_B25003.csv

For more info: see this guide

Now let's load data from 2005 to 2019¶

Let's use a for loop to loop over each year value from 2005 to 2019

all_rental_data = []
start_year = 2005
stop_year = 2019
for year in range(start_year, stop_year + 1):
    
    #---------------------------------------------
    # Step 0: Print out year 
    # The variable year changes for each iteration of the loop
    #----------------------------------------------
    print(f"Loading data for year = {year}...")
    
    #---------------------------------
    # Step 1: Read data for this year
    #---------------------------------
    year_str = str(year) # convert integer value of "year" to a string
    year_str = year_str[2:] # extract the last two digits of the year string
    rental_df = pd.read_csv(f"./data/census/ACS_{year_str}_1YR_B25003.csv")
     
    #---------------------------------
    # Step 2: Columns we want
    #---------------------------------
    columns = ['GEO.display-label', 'HD01_VD01', 'HD01_VD03']
    rental_df = rental_df[columns]

    #---------------------------------
    # Step 3: Rename columns
    #---------------------------------
    rental_df.columns = ['City', 'Total', 'Rentals']
    
    #----------------------------------
    # Step 4: Calculate the rental rate
    #----------------------------------
    rental_df['Rental Rate'] = rental_df['Rentals'] / rental_df['Total'] * 100
    
    #----------------------------------
    # Step 5: Select the cities we want
    #----------------------------------
    selection = rental_df['City'].isin(top_cities_list)
    top_cities_df = rental_df.loc[selection].copy()
    
    #----------------------------------
    # Step 6: Add a column for the year
    #----------------------------------
    top_cities_df['Year'] = year
    
    #-----------------------------------
    # Step 7: Save this dataframe in the list
    #-----------------------------------
    all_rental_data.append(top_cities_df)

#---------------------------------------------------
# Step 8: After the for loop, combine all of the 
# data frames into one along the row axis
#---------------------------------------------------
all_rental_data = pd.concat(all_rental_data, axis=0)

Loading data for year = 2005...
Loading data for year = 2006...
Loading data for year = 2007...
Loading data for year = 2008...
Loading data for year = 2009...
Loading data for year = 2010...
Loading data for year = 2011...
Loading data for year = 2012...
Loading data for year = 2013...
Loading data for year = 2014...
Loading data for year = 2015...
Loading data for year = 2016...
Loading data for year = 2017...
Loading data for year = 2018...
Loading data for year = 2019...

# Note: the pd.concat() took a list of dataframes and converted them 
# into a single dataframe!
type(all_rental_data)

pandas.core.frame.DataFrame

all_rental_data.head()

# What about the tail?
all_rental_data.tail()

# How many rows? Hopefully (6 cities x 15 years = 90!)
len(all_rental_data)

90

Basic plotting: `matplotlib`¶

Much more to come next week. For now, we'll use some very simple plotting commands with Python's main plotting libary matplotlib.

Import the library we'll use from matplotlib.

from matplotlib import pyplot as plt

We'll use the plot() function which plots a simple x vs y line.

The strategy¶

Create a figure axes to plot to
Loop over each unique city
Select the subset of the data for each city
Plot the 'Year' vs. 'Rental Rate' columns using the plot() function for the sub-selection data frame

Note¶

To find the unique city names in the 'City' column, we can use the unique() function:

all_rental_data['City'].unique()

array(['Phoenix city, Arizona', 'Los Angeles city, California',
       'Chicago city, Illinois', 'New York city, New York',
       'Philadelphia city, Pennsylvania', 'Houston city, Texas'],
      dtype=object)

with plt.style.context('ggplot'):
    
    # Step 1: Create a figure and axes to plot on
    fig, ax = plt.subplots(figsize=(10,6))
    
    # Step 2: Loop over each city and plot
    for city in all_rental_data['City'].unique():
        
        # Print out the city in each loop iteration
        print(city)
        
        # Select data for this city
        this_city = all_rental_data['City'] == city
        city_df = all_rental_data.loc[this_city]
        
        # Plot Year vs Rental Rate
        ax.plot(city_df['Year'], city_df['Rental Rate'], label=city, linewidth=4)

    # Add a legend
    ax.legend(loc=0, ncol=3, fontsize=12)

    # Set the y limit
    ax.set_ylim(30, 72)

    # Add a y label
    ax.set_ylabel("Rental Rate in Percent");

Phoenix city, Arizona
Los Angeles city, California
Chicago city, Illinois
New York city, New York
Philadelphia city, Pennsylvania
Houston city, Texas

Note: we used the "ggplot" theme to style our plots. Matplotlib comes with a number of built in styles: see all of the examples here.

From 2005 to 2019, Philadelphia had the third largest percent increase¶

Exercise: Can you calculate what the increase was for Philadelphia?¶

You will need to:

Trim the full data set to just Philadelphia
Select the rental rate for 2019
Select the rental rate for 2005
Calculate the percent change

Exercise solution¶

# trim to just Philadelphia first, using this boolean selection index
all_rental_data['City'] == 'Philadelphia city, Pennsylvania'

13     False
75     False
218    False
349    False
393     True
       ...  
94     False
284    False
439    False
487     True
538    False
Name: City, Length: 90, dtype: bool

# select Philadelphia only
philly = all_rental_data.loc[all_rental_data['City']=='Philadelphia city, Pennsylvania']
philly

An aside about labels and the index in Pandas¶

Note how the first row of the above dataframe has an index label of 393. This means we can access it using the .loc[] function like:

philly.loc[487]

City           Philadelphia city, Pennsylvania
Total                                   619505
Rentals                                 295783
Rental Rate                          47.745055
Year                                      2019
Name: 487, dtype: object

This is different than the .iloc[] function, which also indexes based on the integer value of the row, regardless of what the labels in the index are. For example, the first row of a dataframe can always be accessed using:

For more details on the iloc() function, see the documentation on indexing by position.

philly.iloc[2]

City           Philadelphia city, Pennsylvania
Total                                   562384
Rentals                                 239363
Rental Rate                          42.562199
Year                                      2007
Name: 408, dtype: object

We can reset the index labels so they range from 0 to the length of the dataframe, using the reset_index() function. For example

philly.reset_index(drop=True)

Next, to calculate our percent difference we need to get the values for Philadelphia in 2005 and 2019:

# select 2005
philly_2005 = philly.loc[philly['Year']==2005]


philly_2005

# Notice b/c we have a dataframe with length of 1, we don't get back just a number
philly_2005["Rental Rate"]

393    43.292839
Name: Rental Rate, dtype: float64

len(philly_2005["Rental Rate"])

1

An aside: the squeeze() function¶

Notice that philly_2005 has only a single row. It's more useful if we had this a scalar (just a single number)

We can use the .squeeze() function to do this. It does just one it sounds like: if you have a DataFrame with only one row, it will "squeeze" the row dimension by removing it, returning just a Series object:

philly_2005 = philly.loc[philly['Year']==2005].squeeze()

philly_2005

City           Philadelphia city, Pennsylvania
Total                                   565433
Rentals                                 244792
Rental Rate                          43.292839
Year                                      2005
Name: 393, dtype: object

value_2005 = philly_2005["Rental Rate"]

value_2005

43.292839293072745

# Do the same thing for 2019 — make sure you squeeze!
philly_2019 = philly.loc[philly['Year']==2019].squeeze()

# Get the value
value_2019 = philly_2019['Rental Rate']

value_2019

47.74505451933398

# calculate percent change
percent_change = (value_2019 - value_2005)/ value_2005 * 100

print(f"change in rental rate in Philadelphia from 2005 to 2019 = {percent_change}")

change in rental rate in Philadelphia from 2005 to 2019 = 10.283952956103837

Now let's compare Zillow rental and home value data¶

Load citywide Zillow Rent Index (ZRI) and Zillow Home Value Index (ZHVI) data.

Files were downloaded from https://www.zillow.com/research/data/

home_values = pd.read_csv("data/zillow/Metro_zhvi_uc_sfrcondo_tier_0.33_0.67_sm_sa_month.csv")
rent_values = pd.read_csv("data/zillow/Metro_ZORI_AllHomesPlusMultifamily_SSA.csv")

Peek at the first few rows of the ZRI data:

rent_values.head()

And do the same for the ZHVI data:

home_values.head()

Selecting the cities we want¶

valid_cities = [
    "New York, NY",
    "Chicago, IL",
    "Los Angeles-Long Beach-Anaheim, CA",
    "Philadelphia, PA",
    "Houston, TX",
    "Phoenix, AZ",
]

selection =  home_values['RegionName'].isin(valid_cities)
home_values_trimmed = home_values.loc[selection]

selection = rent_values['RegionName'].isin(valid_cities)
rent_values_trimmed = rent_values.loc[selection]

rent_values_trimmed

Removing unwanted columns¶

Unwanted columns can be dropped from the data frame using the drop() function.

Note that the column axis is the second axis (axis=1), and if you wanted to remove rows, you could use the first axis (axis=0).

x = ['SizeRank', 'RegionID', "RegionType", "StateName"]
home_values_final = home_values_trimmed.drop(x, axis=1)

columns = ['SizeRank', 'RegionID']
rent_values_final = rent_values_trimmed.drop(columns, axis=1)

rent_values_final

Wide vs long format for datasets¶

Currently, our data is in wide format $\rightarrow$ each observation has its own column. This usually results in many columns but few rows.

home_values_final

Usually it's better to have data in tidy (also known as long) format.

Tidy datasets are arranged such that each variable is a column and each observation is a row.

In our case, we want to have a column called ZRI and one called ZHVI and a row for each month that the indices were measured.

pandas provides the melt() function for converting from wide formats to tidy formats.

melt() doesn’t aggregate or summarize the data. It transforms it into a different shape, but it contains the exact same information as before.

Imagine you have 6 rows of data (each row is a unique city) with 10 columns of home values (each column is a different month). That is wide data and is the format usually seen in spreadsheets or tables in a report.

If you melt() that wide data, you would get a table with 60 rows and 3 columns. Each row would contain the city name, the month, and the home value that city and month. This tidy-formatted data contains the same info as the wide data, but in a different form.

This animation shows the transformation from wide to long / long to wide. You can ignore gather() and spread() - those are the R versions of the pandas functions.

pd.melt?

Now, let's melt our datasets:

ZHVI = pd.melt(
    home_values_final, 
    id_vars=["RegionName"], 
    value_name="ZHVI", 
    var_name="Date"
)
ZRI = pd.melt(
    rent_values_final, 
    id_vars=["RegionName"], 
    value_name="ZRI", 
    var_name="Date"
)

and take a look:

ZRI.tail()

ZHVI.head()

Merging data frames¶

Another common operation is merging, also known as joining, two datasets.

We can use the merge() function to merge observations that have the same Date and RegionName values.

But first! Our date string formats don't match!¶

ZVHI has the Date column in the format of YYYY-MM-DD
ZRI has the Date column in the format of YYYY-MM

We need to put them into the same format before merging the data!

We can fix this by create Datetime objects and formatting the dates into the same format.

Datetime objects¶

Currently our Date column is stored as a string.

pandas includes additional functionality for dates, but first we must convert the strings using the to_datetime() function.

# Convert the Date column to Datetime objects
ZHVI["Date"] = pd.to_datetime(ZHVI["Date"])

The `strftime` function¶

We can use the ".dt" property of the Date column to access datetime functions of the new Datetime column.

For converting to strings in a certain format, we can use the "strftime" function (docs). This uses a special syntax to convert the date object to a string with a specific format.

Important reference: Use the this strftime guide to look up the syntax!

# Extract YYYY-MM string
date_strings = ZHVI["Date"].dt.strftime("%Y-%m")

# First entry is a string!
date_strings.iloc[0]

'1996-01'

# Add the strings back as a column
ZHVI["Date"] = date_strings

ZHVI.head()

Now we can merge!¶

zillow_data = pd.merge(ZRI, ZHVI, on=['Date', 'RegionName'], how='outer')

zillow_data.sort_values("Date")

Merging is very powerful and the merge can be done in a number of ways. See the infographic on joining in this repository.

Rent prices in Philadelphia have remained relatively flat, relative to other large cities.¶

First Homework Assignment¶

Available on GitHub:

https://github.com/MUSA-550-Fall-2021/assignment-1

Two parts:

Download and install Python locally on your computer. Instructions in the assignment README!
Analyze Zillow ZHVI data by ZIP code in Philadelphia, submitting your Jupyter notebook.

Due date: Monday 9/20 by the start of class (7pm)

That's it!¶

Next week: Data Visualization Fundamentals
Office hours starting next week
- Nick: Mon and Wed before class, 6 to 7pm in lecture hall (or remote via Zoom, if preferred)
- Stella: Monday 12:30-2pm, remote via Zoom
- Sign up for 15-minute time slots on Canvas (Zoom info in calendar invite)
Post questions on Piazza!
Email questions/concerns to nhand@design.upenn.edu

	GEO.id	GEO.display-label	HD01_VD01	HD02_VD01	HD01_VD02	HD02_VD02	HD01_VD03	HD02_VD03
0	1600000US0103076	Auburn city, Alabama	26156	2687	13095	1829	13061	2108
1	1600000US0107000	Birmingham city, Alabama	93300	3618	37788	2635	55512	2788
2	1600000US0121184	Dothan city, Alabama	27103	1044	15891	824	11212	817
3	1600000US0135896	Hoover city, Alabama	32461	1606	22391	1515	10070	1496
4	1600000US0137000	Huntsville city, Alabama	88930	2823	48728	2758	40202	2550

	GEO.display-label	HD01_VD01	HD01_VD03
0	Auburn city, Alabama	26156	13061
1	Birmingham city, Alabama	93300	55512
2	Dothan city, Alabama	27103	11212
3	Hoover city, Alabama	32461	10070
4	Huntsville city, Alabama	88930	40202

	City	Total	Rentals
0	Auburn city, Alabama	26156	13061
1	Birmingham city, Alabama	93300	55512
2	Dothan city, Alabama	27103	11212
3	Hoover city, Alabama	32461	10070
4	Huntsville city, Alabama	88930	40202

	City	Total	Rentals	Rental Rate
0	Auburn city, Alabama	26156	13061	49.935005
1	Birmingham city, Alabama	93300	55512	59.498392
2	Dothan city, Alabama	27103	11212	41.368114
3	Hoover city, Alabama	32461	10070	31.021842
4	Huntsville city, Alabama	88930	40202	45.206342

	City	Total	Rentals	Rental Rate	Year
393	Philadelphia city, Pennsylvania	565433	244792	43.292839	2005
400	Philadelphia city, Pennsylvania	554048	231323	41.751437	2006
408	Philadelphia city, Pennsylvania	562384	239363	42.562199	2007
415	Philadelphia city, Pennsylvania	578263	253084	43.766245	2008
419	Philadelphia city, Pennsylvania	569835	258459	45.356814	2009
427	Philadelphia city, Pennsylvania	575413	264129	45.902508	2010
434	Philadelphia city, Pennsylvania	576429	265126	45.994563	2011
437	Philadelphia city, Pennsylvania	579874	277323	47.824700	2012
449	Philadelphia city, Pennsylvania	582528	285430	48.998503	2013
454	Philadelphia city, Pennsylvania	577862	277301	47.987409	2014
458	Philadelphia city, Pennsylvania	581604	275397	47.351291	2015
465	Philadelphia city, Pennsylvania	580205	277707	47.863600	2016
474	Philadelphia city, Pennsylvania	606142	309279	51.024182	2017
486	Philadelphia city, Pennsylvania	608233	287543	47.275140	2018
487	Philadelphia city, Pennsylvania	619505	295783	47.745055	2019

	City Name	Population	Large Cities	Small Cities
0	San Francisco	852469	False	True
1	San Jose	1015785	True	False
2	Sacramento	485199	False	True

	City	Total	Rentals	Rental Rate
19	Phoenix city, Arizona	586878	267225	45.533314
94	Los Angeles city, California	1398900	887785	63.463078
284	Chicago city, Illinois	1080345	603352	55.848086
439	New York city, New York	3211033	2188214	68.146730
487	Philadelphia city, Pennsylvania	619505	295783	47.745055
538	Houston city, Texas	876504	522942	59.662249

	City	Total	Rentals	Rental Rate	Year
13	Phoenix city, Arizona	503753	204914	40.677475	2005
75	Los Angeles city, California	1284124	771325	60.066240	2005
218	Chicago city, Illinois	1020605	525620	51.500825	2005
349	New York city, New York	3026196	2023924	66.880136	2005
393	Philadelphia city, Pennsylvania	565433	244792	43.292839	2005

	RegionID	RegionName	SizeRank	2014-01	2014-02	2014-03	2014-04	2014-05	2014-06	2014-07	...	2020-10	2020-11	2020-12	2021-01	2021-02	2021-03	2021-04	2021-05	2021-06	2021-07
0	102001	United States	0	1356.0	1361	1367.0	1373	1378	1384	1390	...	1712	1721	1729	1738	1747	1757	1766	1776.0	1786	1796
1	394913	New York, NY	1	2205.0	2214	2224.0	2234	2244	2254	2264	...	2437	2433	2428	2424	2421	2418	2415	2414.0	2413	2413
2	753899	Los Angeles-Long Beach-Anaheim, CA	2	1868.0	1879	1890.0	1902	1913	1924	1935	...	2529	2538	2546	2554	2563	2572	2581	2591.0	2601	2611
3	394463	Chicago, IL	3	1437.0	1441	1445.0	1449	1453	1456	1460	...	1651	1653	1655	1657	1659	1662	1664	1667.0	1670	1674
4	394514	Dallas-Fort Worth, TX	4	1179.0	1182	1186.0	1190	1194	1198	1202	...	1519	1529	1540	1551	1562	1573	1585	1597.0	1608	1620

	RegionID	SizeRank	RegionName	RegionType	StateName	1996-01-31	1996-02-29	1996-03-31	1996-04-30	1996-05-31	...	2020-10-31	2020-11-30	2020-12-31	2021-01-31	2021-02-28	2021-03-31	2021-04-30	2021-05-31	2021-06-30	2021-07-31
0	102001	0	United States	Country	NaN	107860.0	107887.0	107937.0	108064.0	108208.0	...	262913.0	265716.0	268690.0	271763.0	275071.0	278662.0	282735.0	287579.0	293121.0	298933.0
1	394913	1	New York, NY	Msa	NY	186908.0	186471.0	186194.0	185663.0	185347.0	...	499371.0	504428.0	509356.0	514095.0	518935.0	524000.0	529570.0	536247.0	544198.0	552607.0
2	753899	2	Los Angeles-Long Beach-Anaheim, CA	Msa	CA	184839.0	185096.0	185116.0	185224.0	185197.0	...	719725.0	727136.0	735212.0	743347.0	752071.0	761150.0	773063.0	790724.0	811628.0	831593.0
3	394463	3	Chicago, IL	Msa	IL	147491.0	147472.0	147351.0	147412.0	147317.0	...	252974.0	255348.0	257714.0	259803.0	262422.0	265051.0	268420.0	271938.0	276069.0	280130.0
4	394514	4	Dallas-Fort Worth, TX	Msa	TX	112545.0	112609.0	112770.0	113092.0	113439.0	...	268525.0	271296.0	274597.0	277507.0	281346.0	285684.0	291484.0	298128.0	305540.0	313393.0

	RegionName	Date	ZRI
541	Los Angeles-Long Beach-Anaheim, CA	2021-07	2611.0
542	Chicago, IL	2021-07	1674.0
543	Philadelphia, PA	2021-07	1781.0
544	Houston, TX	2021-07	1374.0
545	Phoenix, AZ	2021-07	1732.0

	RegionName	Date	ZHVI
0	New York, NY	1996-01-31	186908.0
1	Los Angeles-Long Beach-Anaheim, CA	1996-01-31	184839.0
2	Chicago, IL	1996-01-31	147491.0
3	Philadelphia, PA	1996-01-31	120665.0
4	Houston, TX	1996-01-31	110158.0

	RegionName	Date	ZHVI
0	New York, NY	1996-01	186908.0
1	Los Angeles-Long Beach-Anaheim, CA	1996-01	184839.0
2	Chicago, IL	1996-01	147491.0
3	Philadelphia, PA	1996-01	120665.0
4	Houston, TX	1996-01	110158.0