### Ingest Module Example Usage

- Make sure this notebook has the kernel selected corresponding to your poetry environment.
- Poetry should handle loading the environment variables from the `.env` without having to install `python-dotenv`

In [1]:
import os
from rentradar.ingest.rentcast_client import RentCastAPIClient

RENTCAST_API_KEY = os.environ.get('RENTCAST_API_KEY')

query_params = {
    "city": "Charlottesville",
    "state": "VA"
}

The `RentCastAPIClient` object has a classmethod named `create` that pulls the desired data and stores it in an instance of the dataclass. The client also have a method named `to_frame` that will output the data as a pandas dataframe. The query params take in a city and state. Beware that the create endpoint has pagination, so it will keep pulling until all data for a city has been retrived. 

**THIS ISN'T FREE. MAKE SURE TO VIEW THE RENTCAST API DOCS**

### Long Term Rentals

- The following cell results in 3,593 long term rental properties in Charlottesville and it took 8 RentCast API requests to pull it all.
- API docs: https://developers.rentcast.io/reference/rental-listings-long-term

In [2]:
long_term_rentals_endpoint = "/listings/rental/long-term"

ltr = RentCastAPIClient.create(
    endpoint=long_term_rentals_endpoint, 
    query_params=query_params, 
    api_key=RENTCAST_API_KEY, 
    limit=500
)

ltr_df = ltr.to_frame()

INFO:rentradar.ingest.rentcast_client:Starting data fetch for endpoint: /listings/rental/long-term
INFO:rentradar.ingest.rentcast_client:Completed fetching all data for endpoint: /listings/rental/long-term


In [3]:
ltr_df.describe()

Unnamed: 0,latitude,longitude,bedrooms,bathrooms,squareFootage,price,daysOnMarket,yearBuilt,lotSize
count,3593.0,3593.0,3514.0,3539.0,3072.0,3593.0,3063.0,1006.0,253.0
mean,38.048722,-78.489303,2.589357,2.005651,1545.739583,3003.771,131.890957,1981.985089,36926.83
std,0.03813,0.04347,1.087993,0.910554,800.212154,66705.15,200.155308,31.023226,136367.3
min,37.895933,-78.761071,0.0,1.0,200.0,20.0,1.0,1800.0,436.0
25%,38.024491,-78.507789,2.0,1.0,972.0,1395.0,16.0,1964.0,3920.0
50%,38.036166,-78.492022,3.0,2.0,1350.0,1750.0,43.0,1986.0,7405.0
75%,38.067244,-78.462647,3.0,2.5,2000.0,2200.0,137.0,2007.0,12632.0
max,38.199458,-78.334474,10.0,6.0,7568.0,4000000.0,1728.0,2121.0,1393920.0


In [4]:
ltr_df.dtypes

id                   object
formattedAddress     object
addressLine1         object
addressLine2         object
city                 object
state                object
zipCode              object
county               object
latitude            float64
longitude           float64
propertyType         object
bedrooms            float64
bathrooms           float64
squareFootage       float64
status               object
price                 int64
listedDate           object
removedDate          object
createdDate          object
lastSeenDate         object
daysOnMarket        float64
yearBuilt           float64
lotSize             float64
dtype: object

In [5]:
ltr_df.isnull().sum()

id                     0
formattedAddress       2
addressLine1           0
addressLine2        2356
city                   0
state                  0
zipCode                2
county                35
latitude               0
longitude              0
propertyType           4
bedrooms              79
bathrooms             54
squareFootage        521
status                 0
price                  0
listedDate           530
removedDate          180
createdDate            0
lastSeenDate           0
daysOnMarket         530
yearBuilt           2587
lotSize             3340
dtype: int64

In [6]:
ltr_df.to_csv('../data/cville_long_term_rentals.csv', index=False)

### Properties

- The following cell will pull all properties in Charlottesville. It results in 39,245 properties.
- API docs: https://developers.rentcast.io/reference/property-records
- Property types: https://developers.rentcast.io/reference/property-types

In [9]:
properties_endpoint = "/properties"

properties = RentCastAPIClient.create(
    endpoint=properties_endpoint, 
    query_params=query_params, 
    api_key=RENTCAST_API_KEY, 
    limit=500
)

properties_df = properties.to_frame()

INFO:rentradar.ingest.rentcast_client:Starting data fetch for endpoint: /properties
INFO:rentradar.ingest.rentcast_client:Completed fetching all data for endpoint: /properties


In [15]:
properties_df.describe()

Unnamed: 0,latitude,longitude,lastSalePrice,bedrooms,bathrooms,squareFootage,lotSize,yearBuilt
count,39245.0,39245.0,12067.0,26756.0,26724.0,26636.0,20034.0,24239.0
mean,38.047849,-78.48584,1263731.0,3.146584,2.431616,1834.032024,81017.04,1980.733529
std,0.040759,0.043238,6368437.0,1.023607,0.987986,4269.540872,946103.7,30.382441
min,37.796512,-78.732894,850.0,0.0,1.0,200.0,1.0,1730.0
25%,38.024046,-78.503983,186962.5,3.0,2.0,1164.0,4356.0,1965.0
50%,38.037984,-78.485854,285000.0,3.0,2.5,1565.5,8059.0,1985.0
75%,38.069459,-78.461811,407908.0,4.0,3.0,2148.0,14810.0,2003.0
max,38.243089,-78.15235,121098100.0,20.0,15.0,336000.0,43516440.0,2121.0


In [14]:
properties_df.isnull().sum()

id                      0
formattedAddress        0
addressLine1            0
addressLine2        28290
city                    0
state                   0
zipCode                 0
county                  2
latitude                0
longitude               0
features             4856
lastSaleDate        27179
lastSalePrice       27178
bedrooms            12489
bathrooms           12521
squareFootage       12609
propertyType        11801
lotSize             19211
yearBuilt           15006
assessorID          16964
legalDescription    17731
subdivision         18184
zoning              17394
taxAssessments      17024
propertyTaxes       17543
owner               16999
ownerOccupied       16999
dtype: int64

In [16]:
properties_df.to_csv('../data/cville_properties.csv', index=False)

### Sale Listings

- The following cell will pull all sales listings with the price in Charlottesville. It results in 6005 listings.
- API docs: https://developers.rentcast.io/reference/sale-listings

In [17]:
sale_listings_endpoint = "/listings/sale"

sl = RentCastAPIClient.create(
    endpoint=sale_listings_endpoint, 
    query_params=query_params, 
    api_key=RENTCAST_API_KEY, 
    limit=500
)

sl_df = sl.to_frame()

INFO:rentradar.ingest.rentcast_client:Starting data fetch for endpoint: /listings/sale
INFO:rentradar.ingest.rentcast_client:Completed fetching all data for endpoint: /listings/sale


In [18]:
sl_df.describe()

Unnamed: 0,latitude,longitude,bedrooms,bathrooms,squareFootage,lotSize,yearBuilt,price,daysOnMarket
count,6005.0,6005.0,5648.0,5444.0,5654.0,2565.0,2795.0,6005.0,6005.0
mean,38.060375,-78.489824,3.383676,2.787656,2374.554652,130430.1,1994.269767,564854.8,249.49159
std,0.05177,0.064215,0.993656,0.981178,1219.263286,939610.6,29.934494,566905.0,246.506567
min,37.812802,-78.761071,0.0,1.0,384.0,436.0,1730.0,5000.0,1.0
25%,38.024774,-78.511774,3.0,2.0,1643.0,4356.0,1975.0,330000.0,27.0
50%,38.059013,-78.478488,3.0,2.5,2155.0,7405.0,2001.0,440000.0,148.0
75%,38.100422,-78.44493,4.0,3.5,2795.0,21780.0,2022.0,624900.0,453.0
max,38.243089,-78.15235,16.0,11.5,17647.0,25395480.0,2024.0,12000000.0,2259.0


In [19]:
sl_df.isnull().sum()

id                     0
formattedAddress       0
addressLine1           0
addressLine2        4975
city                   0
state                  0
zipCode                0
county                 0
latitude               0
longitude              0
propertyType          11
bedrooms             357
bathrooms            561
squareFootage        351
lotSize             3440
yearBuilt           3210
status                 0
price                  0
listedDate             0
removedDate          220
createdDate            0
lastSeenDate           0
daysOnMarket           0
dtype: int64

In [20]:
sl_df.to_csv('../data/cville_sale_listings.csv', index=False)

### Market Statistics

- The following endpoint returns aggregate rental listing data, averages, listing statistics, and historical trends for a single zip code.
- API docs: https://developers.rentcast.io/reference/market-statistics

In [2]:
import pandas as pd

rc_client = RentCastAPIClient(api_key=RENTCAST_API_KEY)

properties_df = pd.read_csv('../data/cville_properties.csv')

zipcodes = properties_df.zipCode.unique()

query_params = {"zipCodes": zipcodes, "historyRange": 36}

current_stats, historical_stats = rc_client.process_markets_endpoint(
            endpoint="/markets",
            query_params=query_params
        )

ERROR:rentradar.ingest.rentcast_client:Failed to fetch data for zip code: 22904
ERROR:rentradar.ingest.rentcast_client:Failed to fetch data for zip code: 22906


In [3]:
current_stats.describe()

Unnamed: 0,bedrooms,averageRent,minRent,maxRent,totalListings,zipCode
count,45.0,45.0,45.0,45.0,45.0,45.0
mean,2.8,2281.288889,1890.511111,3020.222222,9.866667,22998.444444
std,1.603972,968.99014,1030.943712,1410.51085,13.610825,347.771905
min,0.0,750.0,610.0,750.0,1.0,22901.0
25%,2.0,1525.0,1250.0,1750.0,1.0,22903.0
50%,3.0,2260.0,1650.0,2850.0,4.0,22923.0
75%,4.0,2681.0,2400.0,3600.0,11.0,22942.0
max,8.0,5450.0,5450.0,6500.0,60.0,24590.0


In [4]:
current_stats.head()

Unnamed: 0,bedrooms,averageRent,minRent,maxRent,totalListings,lastUpdatedDate,zipCode
0,1,1439,610,3000,35,2024-02-24T00:00:00.000Z,22903
1,2,1734,875,2850,60,2024-02-24T00:00:00.000Z,22903
2,3,2260,1425,4000,46,2024-02-24T00:00:00.000Z,22903
3,4,2667,690,5800,20,2024-02-24T00:00:00.000Z,22903
4,5,3800,3000,4700,7,2024-02-24T00:00:00.000Z,22903


In [5]:
historical_stats.describe()

Unnamed: 0,bedrooms,averageRent,minRent,maxRent,totalListings,zipCode
count,1706.0,1706.0,1706.0,1706.0,1706.0,1706.0
mean,2.783118,2184.987403,1759.187573,2789.412661,10.430832,22986.381594
std,1.829981,1188.057205,1196.744356,1514.572759,15.693328,322.455422
min,0.0,675.0,494.0,675.0,1.0,22901.0
25%,1.0,1381.5,995.0,1700.0,1.0,22903.0
50%,3.0,1911.03,1409.5,2475.0,4.0,22911.0
75%,4.0,2582.5,2200.0,3500.0,12.0,22942.0
max,10.0,8325.0,8325.0,10000.0,105.0,24590.0


In [6]:
historical_stats.head()

Unnamed: 0,bedrooms,averageRent,minRent,maxRent,totalListings,date,zipCode
0,0,916.67,515,1299,3,2021-03,22903
1,1,1148.33,600,1560,12,2021-03,22903
2,2,1653.7,950,4000,10,2021-03,22903
3,3,1664.0,494,2700,11,2021-03,22903
4,4,1142.5,585,1700,2,2021-03,22903


In [7]:
current_stats.to_csv('../data/cville_current_market_stats.csv', index=False)

In [8]:
historical_stats.to_csv('../data/cville_historical_market_stats.csv', index=False)