# Download Historical Precipitation Data

This Jupyter Notebook is dedicated to the systematic retrieval of historical precipitation data, a critical input for the hydrological modeling components of this study. The primary objective is to download high-resolution precipitation records, specifically the 15-minute precipitation data, from the National Oceanic and Atmospheric Administration (NOAA) FTP server. The notebook's code will load all precipitation gauge stations located within the boundaries of the previously defined study region and then proceed to download the complete available historical dataset for these selected stations. This comprehensive collection of precipitation data is essential for characterizing storm events, driving the ModClark hydrological model simulations, and supporting the selection of appropriate events for model calibration and evaluation.

In [2]:
# Import necessary modules
import geopandas as gpd
from pathlib import Path
import sys
import warnings
from IPython.display import display, Markdown


In [3]:
# Base path
project_root_path = Path.cwd().parent.parent

# Supress warnings
warnings.filterwarnings("ignore")

In [4]:
# Add 'src' to system path
sys.path.append(str(project_root_path / 'src'))

# Import modules
from data_download.download import download_large_file

#### Load Precipitation Stations Inventory

In [5]:
# Load metada for historical precipitation file version 2
ppt_stn_path = project_root_path / 'data/silver/geo/gpkg/study_area_ppt_stn.gpkg'
ppt_stn = gpd.read_file(ppt_stn_path)
ppt_stn.head()

Unnamed: 0,index,StnID,Lat,Lon,Elev,State/Province,Name,WMO_ID,Sample_Interval (min),UTC_Offset,POR_Date_Range,PCT_POR_Good,Last_Half_POR,PCT_Last_Half_Good,Last_Qtr_POR,PCT_Last_Qtr_Good,geometry
0,11,USC00011099,34.9809,-85.8101,203.3,AL,BRIDGEPORT 5 NW,,15,-6,19821101-20250109,87.2%,20031206-20250109,93.7%,20140623-20250109,96.2%,POINT (-85.8101 34.9809)
1,1540,USC00406162,35.2243,-85.8414,563.9,TN,MONTEAGLE,,15,-6,19820601-20250108,89.8%,20030920-20250108,88.3%,20140515-20250108,94.9%,POINT (-85.8414 35.2243)
2,1535,USC00405187,35.414,-86.8086,239.9,TN,LEWISBURG EXP STA,,15,-6,19710501-20250117,81.8%,19980310-20250117,71.2%,20110814-20250117,61.8%,POINT (-86.8086 35.414)
3,1551,USC00408540,35.6763,-84.8547,230.1,TN,SPRING CITY,,15,-5,20030501-20250106,92.8%,20140304-20250106,95.4%,20190805-20250106,90.9%,POINT (-84.8547 35.6763)
4,1525,USC00401587,35.7553,-87.4261,201.2,TN,CENTERVILLE WATER PL,,15,-6,19820401-20060726,83.6%,19940529-20060726,75.8%,20000626-20060726,69.4%,POINT (-87.4261 35.7553)


In [7]:
# Rows
display(Markdown(f'There are **{ppt_stn.shape[0]}** precipitation station either within or influencing the study area.'))

There are **260** precipitation station either within or influencing the study area.

In [8]:
# Extract Code
stn_codes = ppt_stn['StnID']
stn_codes

0      USC00011099
1      USC00406162
2      USC00405187
3      USC00408540
4      USC00401587
          ...     
255    USC00128992
256    USC00123418
257    USC00124837
258    USC00124730
259    USC00120200
Name: StnID, Length: 260, dtype: object

#### Download Data

In [9]:
# Download data
destination_folder = project_root_path / 'data/bronze/tabular/precipitation'
destination_folder.mkdir(parents=True, exist_ok=True)
count = 0
length = len(stn_codes)
for stn in stn_codes:
    StnID_extension = stn + '.csv'
    file_name = destination_folder / StnID_extension
    url = 'https://www1.ncdc.noaa.gov/pub/data/hpd/auto/v2/beta/15min/all_csv/' + stn + '.15m.csv'
    
    count +=1
    print(f" {count}/{length} - downloading {stn} from: {url}")
    try:
        download_large_file(url=url, destination=file_name, max_retries=3, chunk_size=8192)
    except Exception as err:
        print(f'Failed to download {stn}: {err}')
    finally:
        continue

 1/260 - downloading USC00011099 from: https://www1.ncdc.noaa.gov/pub/data/hpd/auto/v2/beta/15min/all_csv/USC00011099.15m.csv
Failed to download file. Server responded with status code 416.

Function 'download_large_file' executed in 0.6343 seconds.
 2/260 - downloading USC00406162 from: https://www1.ncdc.noaa.gov/pub/data/hpd/auto/v2/beta/15min/all_csv/USC00406162.15m.csv
Failed to download file. Server responded with status code 416.

Function 'download_large_file' executed in 0.6125 seconds.
 3/260 - downloading USC00405187 from: https://www1.ncdc.noaa.gov/pub/data/hpd/auto/v2/beta/15min/all_csv/USC00405187.15m.csv
Starting download: /Users/alan/Data Science Projects/ML-ModClark-IUH-Model/data/bronze/tabular/precipitation/USC00405187.csv (417792/16687820 bytes)
Downloaded 16687820/16687820 bytes (100.00%)
Download completed: /Users/alan/Data Science Projects/ML-ModClark-IUH-Model/data/bronze/tabular/precipitation/USC00405187.csv

Function 'download_large_file' executed in 3.6075 sec

The log printed above shows that some of the data, was not downloaded, due to an err. The code below is to try to download again those failed files. 

In [13]:
# Download again failed downloads
for stn in stn_codes:
    StnID_extension = stn + '.csv'
    file_name = destination_folder / StnID_extension
    if not file_name.exists():
        url = 'https://www1.ncdc.noaa.gov/pub/data/hpd/auto/v2/beta/15min/all_csv/' + stn + '.15m.csv'
    
        print(f" {count}/{length} - downloading {stn} from: {url}")
        try:
            download_large_file(url=url, destination=file_name, max_retries=3, chunk_size=8192)
        except Exception as err:
            print(f'Failed to download {stn}: {err}')
        finally:
            continue
    else:
        print(f'{StnID_extension} already exists, skipping to next file')
print('\nLoop finalized executing. Check logs to verify for failed downloads.')

USC00011099.csv already exists, skipping to next file
USC00406162.csv already exists, skipping to next file
USC00405187.csv already exists, skipping to next file
USC00408540.csv already exists, skipping to next file
USC00401587.csv already exists, skipping to next file
USC00407811.csv already exists, skipping to next file
USC00401585.csv already exists, skipping to next file
USC00406371.csv already exists, skipping to next file
USC00402489.csv already exists, skipping to next file
USC00406170.csv already exists, skipping to next file
USC00405108.csv already exists, skipping to next file
USC00408766.csv already exists, skipping to next file
USC00401480.csv already exists, skipping to next file
USC00406806.csv already exists, skipping to next file
USC00401663.csv already exists, skipping to next file
USC00405332.csv already exists, skipping to next file
USC00409493.csv already exists, skipping to next file
USC00151631.csv already exists, skipping to next file
USC00150611.csv already exis

Although, the previous printed log, indicated that some of the data could not be downloaded, all data was downloaded successfully as seen from above log. 