Main driver script for extracting solar resource data from the NREL NSRDB (National Solar Radiation Database) API.

This script:
1. Queries the API and either saves download URLs or stores ZIP files in `nsrdb_zip_cache/`.
2. Downloads data from stored URLs and saves them as `.xlsx` files in `raw_nsrdb_data/`.
3. Extracts ZIP files (if any) and organizes them into the appropriate folders.
4. Deletes the 'nsrdb_zip_cache/`.

All final `.xlsx` files are stored in the `raw_nsrdb_data/` directory. 

## 1. Extracting Data

In [1]:
import scripts as scr
import os

In [None]:
# Step 1: Query the NSRDB API
scr.query_api()

--- YEAR 2021 ---
Requesting south: McAllen (2021)...
Queued: File generation in progress. An email will be sent to kmundey@utexas.edu when the download is ready.

Requesting south: Austin (2021)...
Queued: File generation in progress. An email will be sent to kmundey@utexas.edu when the download is ready.

Requesting south: SanAntonio (2021)...
Queued: File generation in progress. An email will be sent to kmundey@utexas.edu when the download is ready.

Requesting south: Laredo (2021)...
Queued: File generation in progress. An email will be sent to kmundey@utexas.edu when the download is ready.

Requesting south: CorpusChristi (2021)...
Queued: File generation in progress. An email will be sent to kmundey@utexas.edu when the download is ready.


Requesting north: Waco (2021)...
Queued: File generation in progress. An email will be sent to kmundey@utexas.edu when the download is ready.

Requesting north: Dallas (2021)...
Queued: File generation in progress. An email will be sent to kmun

In [3]:
# Step 2: Download .csv files from download URLs
scr.process_urls()
print('Finished downloading NSRDB data into the folder `raw_nsrdb_data`.')

--- YEAR 2021 ---
Downloading nsrdb_2021_south_McAllen.csv

Downloading nsrdb_2021_south_Austin.csv

Downloading nsrdb_2021_south_SanAntonio.csv

Downloading nsrdb_2021_south_Laredo.csv

Downloading nsrdb_2021_south_CorpusChristi.csv


Downloading nsrdb_2021_north_Waco.csv

Downloading nsrdb_2021_north_Dallas.csv

Downloading nsrdb_2021_north_Tyler.csv


Downloading nsrdb_2021_west_Amarillo.csv

Downloading nsrdb_2021_west_Lubbock.csv

Downloading nsrdb_2021_west_Midland.csv

Downloading nsrdb_2021_west_SanAngelo.csv

Downloading nsrdb_2021_west_WichitaFalls.csv

Downloading nsrdb_2021_west_Alpine.csv


Downloading nsrdb_2021_east_Houston.csv



--- YEAR 2022 ---
Downloading nsrdb_2022_south_McAllen.csv

Downloading nsrdb_2022_south_Austin.csv

Downloading nsrdb_2022_south_SanAntonio.csv

Downloading nsrdb_2022_south_Laredo.csv

Downloading nsrdb_2022_south_CorpusChristi.csv


Downloading nsrdb_2022_north_Waco.csv

Downloading nsrdb_2022_north_Dallas.csv

Downloading nsrdb_2022_north_T

## 2. Aggregate the Data into single files

#### 2a. Aggregate the data by city

Combines all city-level NSRDB Excel files from multiple years into a single DataFrame,adds region and city metadata, generates a timeseries column (YYYY-MM-DD-HH), and saves the result to a CSV file in the 'clean_data' directory.

In [1]:
import scripts as scr
import os
import pandas as pd

In [2]:
# Process each year of data, creating one dataframe containing every city
df_2021 = scr.aggregate_one_year_by_city('2021')
df_2022 = scr.aggregate_one_year_by_city('2022')
df_2023 = scr.aggregate_one_year_by_city('2023')

# Combine all years into one timeseries dataframe
city_df = pd.concat([df_2021, df_2022, df_2023])

In [3]:
# Create clean_data folder
clean_data_path = os.path.join('..', 'clean_data')
os.makedirs(clean_data_path, exist_ok=True)

# Save to CSV
output_path = os.path.join(clean_data_path, 'nsrdb_by_city.csv')
city_df.to_csv(output_path)

#### 2b. Aggregate data by region

In [4]:
# Create one dataframe containing region-level data
regions_df = scr.aggregate_by_region(city_df)

# Save to CSV
regions_df.to_csv(os.path.join(clean_data_path, "nsrdb_by_region.csv"))