# Snow Pit Data Access and SWE Calculation

This notebook is designed to access data from snow pits gathered during the SnowEx field campaigns. There are two embedded examples: a simple use case with `earthaccess` and snow depth data, and a more advanced example using the SnowEx database (`snowexsql`) to obtain snow depth and density.

Note that the `snowexsql` example is currently a work in progress, using code from a SnowEx Database example found here: https://snowexsql.readthedocs.io/en/latest/gallery/plot_pit_swe_example.html

In [None]:
!pip install contextily cmcrameri

In [None]:
import earthaccess
import pandas as pd
import geopandas as gpd
import os
import tempfile
from shapely.geometry import Point
import shutil
import cmcrameri.cm as cmc
import matplotlib.pyplot as plt
import contextily as ctx

## earthaccess example
For the earthaccess example, we are using the DOI of the "SnowEx20 Community Snow Depth Probe Measurements, Version 1" obtained at Grand Mesa, CO. These files are stored in CSV format, and contain snow depths using magnaprobes, Mesa2 tablets, and pit rulers.

Note that the `short_name` of the dataset is provided in the cell. This may be used as an alternative to the DOI, if desired.

In [None]:
# Authenticate with Earthdata Login servers
auth = earthaccess.login(strategy="interactive")

# Search for granules
results = earthaccess.search_data(
    #short_name="SNEX20_SD",
    doi = "10.5067/9IA978JIACAR",
    temporal=('2020-01-01', '2020-03-01'),
)

In [None]:
display(results[0])

Our query returned a single CSV file over the time frame of interest. Note how we did not include a spatial bound for the data here - this is because the dataset of interest only gathered data over Grand Mesa, CO.

To obtain the data, we will create a temporary directory (`tempfile.mkdtemp()`), and load the data into a GeoDataFrame. The temporary directory (and data within) will be deleted after processing.

In [None]:
# Create a temporary directory for downloads
temp_dir = tempfile.mkdtemp()
print(f"Using temporary directory: {temp_dir}")

# Download the data to the temp directory
downloaded_files = earthaccess.download(
    results,
    local_path=temp_dir,
)
print(f"Downloaded {len(downloaded_files)} files to {temp_dir}")

# Process CSV files and convert to GeoDataFrame
gdf = gpd.GeoDataFrame()
csv_files = [file for file in downloaded_files if file.endswith('.csv')]
if csv_files:
    for i, csv_file in enumerate(csv_files):
        print(f"Processing: {os.path.basename(csv_file)}")

        # Read the csv file
        tmp_df = pd.read_csv(csv_file)

        # Convert to GeoDataFrame
        geometry = [Point(xy) for xy in zip(tmp_df['Easting'], tmp_df['Northing'])]
        tmp_gdf = gpd.GeoDataFrame(tmp_df, geometry=geometry, crs="EPSG:32612")

        # Add to final GeoDataFrame
        gdf = pd.concat([gdf, tmp_gdf])

print("All files processed.")
print(' ')
print(f"Removing temporary directory: {temp_dir}")
shutil.rmtree(temp_dir)

Fast and easy! We can now check out the contents of the data.

In [None]:
gdf.head()

Some columns of interest include:
* `Measurement Tool [...]`: The measurement tool used to measure snow depth: MP = magnaprobe, M2 = Mesa2 tablet, and PR = pit ruler.
* `Date [...]`: The date of the measurement, in yyyymmdd format.
* `PitID`: The designated pit ID associated with the measurement.
* `Depth (cm)`: Snow depth, in centimeters.
* `elevation (m)`: Surface elevation at the location of the measurement.

The key variables for this example - the measurement approach and the depth - could use renaming. Let's do that now.

In [None]:
gdf.rename(columns={"Measurement Tool (MP = Magnaprobe; M2 = Mesa 2; PR = Pit Ruler)": 'measurement_tool',
                    "Depth (cm)": 'snow_depth'}, inplace=True
          )

Now, let's make a map plot showing the locations of the measurements, colored by snow depth value.

In [None]:
fig, ax = plt.subplots(figsize=(10, 10))

# Define min/max values for colormap
vmin = gdf['snow_depth'].quantile(0.15)
vmax = gdf['snow_depth'].quantile(0.85)

# Convert to EPSG:3857 to match with the contextily basemap
if gdf.crs != 'EPSG:3857':
    gdf_web = gdf.to_crs(epsg=3857)
    ax.set_xlim(gdf_web.total_bounds[[0, 2]])
    ax.set_ylim(gdf_web.total_bounds[[1, 3]])
else:
    ax.set_xlim(gdf.total_bounds[[0, 2]])
    ax.set_ylim(gdf.total_bounds[[1, 3]])

# Plot snow depths by location
gdf_web.plot(
    column='snow_depth',
    ax=ax,
    markersize=10,
    cmap='cmc.navia',
    legend=True,
    legend_kwds={'shrink': 0.3, 'label': 'Snow depth [cm]'},
    vmin=vmin,
    vmax=vmax
)

# Add topographic map for spatial reference
ctx.add_basemap(
    ax, 
    source=ctx.providers.OpenTopoMap,
    zoom='auto'
)

ax.set_xlabel("Easting [m]", fontsize=14)
ax.set_ylabel("Northing [m]", fontsize=14)
plt.tight_layout()
plt.show()

The above map looks pretty cool, but we might be interested to see how the different measurement approaches differ in accuracy and uncertainty. Let's now use `seaborn` to generate snow depth histograms for each instrument.

In [None]:
import seaborn as sns

# Get the unique measurement values
unique_measurements = gdf['measurement_tool'].unique()

# Make 1x3 figure for each tool
fig, axs = plt.subplots(1, 3, figsize=(18, 6), sharey=True)
# Set consistent background
sns.set_style("whitegrid")

# Loop through unique measurement tools to make a plot for each
for i, measurement in enumerate(unique_measurements):
    subset = gdf[gdf['measurement_tool']==measurement]

    # Make a KDE plot normalized by density, rather than raw counts
    sns.histplot(subset['snow_depth'],
                 ax=axs[i],
                 kde=True,
                 bins=30,
                 edgecolor='black',
                 linewidth=0.5,
                 stat="density",
                 common_norm=False
                )

    # Draw a vertical line at the median snow depth
    median_val = subset['snow_depth'].median()
    axs[i].axvline(median_val, color='green', linestyle='--', linewidth=2,
                   label=f'Median: {median_val} cm')

    # Add text that notes the total number of measurements
    axs[i].text(
            0.05, 0.95,
            f"n = {len(subset)}",
            transform=axs[i].transAxes,
            fontsize=12,
            verticalalignment='top'
    )

    axs[i].set_title(f'{measurement}', fontsize=14)
    axs[i].set_xlabel("Depth (cm)", fontsize=14)

    # Set y-label only for first figure
    if i == 0:
        axs[i].set_ylabel("Density", fontsize=14)
    else:
        axs[i].set_ylabel(" ")

    axs[i].legend(loc='upper right')

plt.tight_layout()
plt.show()

Thanks to this plot, we can make a comparison between the different instruments. The magnaprobe depths are the lowest by a slight margin, and also appear to have the lowest spread in depths. The Mesa2 tablet depths are the highest by a few centimeters, and the pit rulers have the highest spread.

## SnowEx Database Example
(work in progress)

In [None]:
from snowexsql.api import PointMeasurements, LayerMeasurements

# Instantiate the class to use the properties!
measurements = PointMeasurements()

# Get the unique data names/types in the table
results = measurements.all_types
print('Available types = {}'.format(', '.join([str(r) for r in results])))

# Get the unique instrument in the table
results = measurements.all_instruments
print('\nAvailable Instruments = {}'.format(', '.join([str(r) for r in results])))

# Get the unique dates in the table
results = measurements.all_dates
print('\nAvailable Dates = {}'.format(', '.join(sorted([str(r) for r in results]))))

# Get the unique site names in the table
results = measurements.all_site_names
print('\nAvailable sites = {}'.format(', '.join([str(r) for r in results])))

In [None]:
# Pick the first one we find
#site_id = LayerMeasurements().all_site_ids
site_id = LayerMeasurements()

# Query the database, we only need one point to get a site id and its geometry
#site_df = LayerMeasurements.from_filter(site_id=site_id, limit=1)

# Print it out 
site_id.all_site_ids

In [None]:
data_type = 'depth'

In [None]:
import pandas as pd
site_df = pd.DataFrame()
for site in LayerMeasurements().all_site_ids:
    tmp = PointMeasurements.from_filter(site_id=site, type='depth', limit=1)
    try:
        tmp = tmp.to_crs("EPSG:4326")
    except:
        tmp = tmp.set_crs("EPSG:4326")

    site_df = pd.concat([site_df, tmp])

# Pick the first one we find
#site_id = LayerMeasurements().all_site_ids

# Query the database, we only need one point to get a site id and its geometry
#site_df = LayerMeasurements.from_filter(site_id=site_id)

# Print it out 
#site_df

In [None]:
site_df.explore()

In [None]:
site_df['site_id'][site_df['id']==7643177].values

In [None]:
# Import in our two classes to access the db
from snowexsql.api import LayerMeasurements
from datetime import datetime 

# Find some density pit measurements at the Boise site in december 2019.
df = PointMeasurements.from_filter(
    type="depth",
    site_id="Skyway Tree",
    #date_greater_equal=datetime(2018, 1, 1),
    #date_less_equal=datetime(2022, 12, 1),
    #instrument="magnaprobe",
    limit=1000
)

In [None]:
df

In [None]:
df.plot(column='value', cmap='Blues')

In [None]:
df_area = PointMeasurements.from_area(pt=df.geometry[0], type=data_type, limit=1000, buffer=200)

In [None]:
df_area.keys()

In [None]:
df_area.plot(column='value')

In [None]:
df = df[df.value != 'None']
df['value'] = df['value'].astype(float)
print(df[['site_id', 'value']].groupby(by='site_id').mean())

In [None]:
point = Point(df.iloc[0].easting, df.iloc[0].northing)

In [None]:
df = PointMeasurements.from_filter(type='two_way_travel', limit=100)
df

In [None]:
df.plot()