# Load in Precursor Coincident Dataset sites On-Demand

`coincident` supports both reading into memory and downloading the Precursor Coincident Dataset (PCD) sites explained in https://coincident.readthedocs.io/en/latest/datasets

You can read in individual sites by memory given their PCD ID, or you can download the respective spatiotemporal metadata for ALL sites

In [None]:
from coincident import pcd_fixtures
import matplotlib.pyplot as plt

In [None]:
pcd_fixtures.read_pcd_site?

Remember, our sites are below:

| Provider | PCD Site Identifier          | Description                                                 | Fourway Overlap Area (km²) | Aerial Lidar Start Date | Aerial Lidar End Date |
| :------- | :--------------------------- | :---------------------------------------------------------- | :------------------------- | :---------------------- | :-------------------- |
| USGS     | `CA_SanFrancisco_1_B23`      | Urban over San Francisco                                    | 55                         | 2023-04-20              | 2023-04-20            |
| USGS     | `AZ_PimaCo_2_2021`           | Desert / Mine in southern Arizona                           | 53                         | 2021-09-27              | 2021-11-17            |
| NEON     | `REDB`                       | Deciduous / Conifer in northern Utah                        | 25                         | 2021-05-20              | 2021-05-21            |
| USGS     | `NE_Northeast_Phase2_2_2020` | Cropland in eastern Nebraska                                | 540                        | 2020-11-16              | 2020-12-09            |
| USGS     | `WI_Brown_2_2020`            | Urban / Wetlands in Green Bay, Wisconsin                    | 89                         | 2020-05-07              | 2020-05-07            |
| USGS     | `GA_Central_3_2019`          | Mixed LULC in southern Georgia                              | 745                        | 2020-02-02              | 2020-03-28            |
| NCALM    | `OTLAS.092021.32611.1`       | Southern San Andreas fault line                             | 12                         | 2020-02-15              | 2020-02-18            |
| USGS     | `CA_YosemiteNP_2019`         | Coniferous / Mountainous in northern Yosemite National Park | 84                         | 2019-10-07              | 2019-10-23            |
| USGS     | `TX_DesertMountains_B1_2018` | Shrubland / Grassland in western Texas                      | 165                        | 2019-09-11              | 2019-10-20            |
| NEON     | `BART`                       | Mixed hardwood forest in eastern New Hampshire              | 32                         | 2019-08-25              | 2019-08-25            |
| USGS     | `CO_WestCentral_2019`        | Coniferous / Mountainous in the Colorado Rockies            | 184                        | 2019-08-21              | 2019-09-19            |
| USGS     | `WY_FEMA_East_B9_2019`       | Glaciers / Mountainous in western Wyoming                   | 681                        | 2019-07-26              | 2019-09-22            |
| NEON     | `WREF`                       | Conifer forest in southern Washington state                 | 18                         | 2019-07-12              | 2019-07-15            |

```{note}
The NCALM site's "ground truth" is not aerial lidar, but rather dense aerial SfM from a drone
```

In [None]:
%%time
dict_GA_Central_3_2019 = pcd_fixtures.read_pcd_site("GA_Central_3_2019")

```{note}
Some sites take a longer time to read in than others (ranging from a few seconds to ~1 minute). This is mainly dependent on the overlap area and length of date range.
```

In [None]:
gf_als = dict_GA_Central_3_2019["als"]
gf_maxar = dict_GA_Central_3_2019["maxar"]
gf_is2 = dict_GA_Central_3_2019["is2"]
gf_gedi = dict_GA_Central_3_2019["gedi"]
gf_overlap = dict_GA_Central_3_2019["overlap"]

In [None]:
gf_als

In [None]:
gf_maxar

In [None]:
gf_is2

In [None]:
gf_gedi

In [None]:
gf_overlap

Visualize the acquisition times of the PCD site

In [None]:
fig, ax = plt.subplots(figsize=(12, 5))

als_start = gf_als["start_datetime"].iloc[0]
als_end = gf_als["end_datetime"].iloc[0]
ax.axvspan(als_start, als_end, color="gray", alpha=0.3, label="ALS Window")

ax.scatter(
    gf_maxar["datetime"],
    ["Maxar"] * len(gf_maxar),
    marker="D",
    s=80,
    label="Maxar Stereo",
)
ax.scatter(
    gf_is2["datetime"], ["ICESat-2"] * len(gf_is2), marker="D", s=80, label="ICESat-2"
)
ax.scatter(gf_gedi["datetime"], ["GEDI"] * len(gf_gedi), marker="D", s=80, label="GEDI")

ax.set_title("Data Availability for Site: CA_SanFrancisco_1_B23")
ax.set_xlabel("Date")
ax.set_ylabel("Data Collection")
ax.legend(loc="best")
ax.grid(axis="x", linestyle=":", alpha=0.4)
fig.autofmt_xdate()
plt.tight_layout();

See the spatial extent

In [None]:
style_args = {"fillOpacity": 0.15, "weight": 2.5}

m = gf_als.explore(
    name="ALS", color="gray", style_kwds=style_args, tiles="Esri.WorldImagery"
)
m = gf_maxar.explore(m=m, name="Maxar", color="blue", style_kwds=style_args)
m = gf_is2.explore(m=m, name="ICESat-2", color="orange", style_kwds=style_args)
m = gf_gedi.explore(m=m, name="GEDI", color="green", style_kwds=style_args)
m = gf_overlap.explore(m=m, name="Overlap Area", color="black", style_kwds=style_args)
m

## Downloading

You can also download the PCD files for all sites using the `coincident.pcd_fixtures` module. `coincident.pcd_fixtures.download_pcd_files()` supports this by streaming calls to the respective STAC catalogs and ALS endpoints.

Source code in `coincident/scripts` as seen below supports the downloading of the latest released GitHub assets of these PCD dataset files.
```bash
pixi run python src/coincident/scripts/generate_pcd.py
```

*What's the difference between the two?*

`coincident.pcd_fixtures.download_pcd_files()` grabs the metadata for all PCD sites as provided by the respective STAC catalogs and API endpoints. `src/coincident/scripts/generate_pcd.py` pulls from the latest GitHub assets, which includes more complex overlap area geometries, LULC and elevation statistics over these geometries, and extended ALS metadata. The difference exists at the PCD sites from the latest GitHub assets have this extra metadata manually determined uniquely for each site (via reading individual lidar metadata reports, manually defining overlap geometries based on filtered data, using code that exists outside of `coincident`, etc.).

Because of this, running `coincident.pcd_fixtures.download_pcd_files()` will take minutes and running `src/coincident/scripts/generate_pcd.py` will take seconds.

In [None]:
pcd_fixtures.download_pcd_files?

In [None]:
# pcd_fixtures.download_pcd_files("/tmp")

```{note}
This takes ~4 minutes to run and the total output size for all files is 31mb (parquet files sum to ~8.5mb and geojsons sum to ~22mb)

```