---
format: ipynb
jupyter: julia-1.11
engine: julia
---

# Weather data in `Cropbox`

## Objectives {.unnumbered}
- Explore `Cropbox` for handling weather data for dynamic model simulations

## Readings {.unnumbered}
-   (**required**) Yun K, Kim S-H (2023) Cropbox: a declarative crop modelling framework. in silico Plants 5(1), diac021 (<https://doi.org/10.1093/insilicoplants/diac021>)

## Weather variables 

Weather variables that are commonly monitored in weather stations include air temperature, solar radiation, wind speed, precipitation, and relative humidity. These are critical environmental variables that dictate plant life. Most plant growth models take these weather variables as input to drive plant growth.

Other atmospheric variables that are important for plants but often missing from weather station data include $\mathrm{CO_2}$ and $\mathrm{O_2}$. The **atmospheric concentrations of $\mathrm{CO_2}$** (\[$\mathrm{CO_2}$\]) is particularly important for plants because that's the food they assimilate to sugar through photosynthesis but the majority of weather stations do not monitor \[$\mathrm{CO_2}$\] likely because it is not a typical variable that influences or changes short-term weather whereas the global climate has been changing at an unprecedented rate because of rapidly rising atmospheric \[$\mathrm{CO_2}$\]. The \[$\mathrm{CO_2}$\] varies considerably diurnally, seasonally, and spatially as well and is a critical variable that is missing from data collected by most weather stations. Many earlier crop models did not include $\mathrm{CO_2}$ as a driving variable to simulate plant growth largely because it was assumed to be stable and also it was lacking from the weather data. But most modern plant models include \[$\mathrm{CO_2}$\] as a key input variable; when locally measured values are unavailabe the global average can be used (i.e., 427 $\mathrm{\mu mol}\ \mathrm{mol^{-1}}$ measured at [Mauna Loa Observatory](https://gml.noaa.gov/ccgg/trends/)). Similary, $\mathrm{O_2}$ concentrations are assumed to be at 21% (v/v) at the sea level.

### Weather and climate data sources

Listed below are some of the websites that host weather and climate data in the US and around the world that I have found useful. 

- Washington State University's AgWeatherNet (<https://weather.wsu.edu/>) 
- California Dept of Water Resources' CIMIS (<https://cimis.water.ca.gov/>) 
- Cornell University's NEWA for eastern and mid-west states (<https://newa.cornell.edu/>) 
- NOAA's Climate Data Online (<https://www.ncei.noaa.gov/cdo-web/>): Daily, monthly, seasonal, and yearly historical weather data and climate normals 
- Oregon State University's PRISM Climate Data (<https://prism.oregonstate.edu/>) : Good place to get climate normals 
- CCAFS Climate (<https://www.ccafs-climate.org/>) : Spatially downscaled GCM climate projection data for agricultural applications
- NASA POWER (<https://power.larc.nasa.gov/>) : Global weather and climate data including solar radiation 
- WorldClim (<https://www.worldclim.org/data/index.html>) : Global climate data including historical and future climate projections

:::{#nte-weather-daily .callout-note}
### Daily weather data format example
We will work with a weather dataset collected in 2002 at a research field in Beltsville, MD where corn was growing. The data file is named `Beltsville_2002.csv` and includes daily weather data from May 1 to September 30, 2002. The data file includes the following variables.

- `year`: Year (e.g., 2002)
- `jday`: Julian day (1 to 365)
- `Tavg`: Average air temperature ($^\circ$C)
- `Tmin`: Minimum air temperature ($^\circ$C)
- `Tmax`: Maximum air temperature ($^\circ$C)
- `rad_tot`: Total solar radiation (MJ/m$^2$/day)
- `RH`: Relative humidity (%)
- `Wind`: Wind speed (m/s)
- `Rain`: Precipitation (mm)
:::

## Working with weather data in `Cropbox` ###
Weather data are the driver of plant growth models. In `Cropbox`, we can create a system to manage weather variables, time, and other routine tasks. Let's create a system called `Weather` that can be used as a container that holds a collection of weather variables we discussed here. Since we are not going to make this system do any actual work but use just as data container we do not need to `mixin` controller (this means we won't be able to instantiate it by itself as we have seen in Lab 01). These weather variables change dynamically over time and can be provided as time series data by weather stations. We will use this `system` to collect the daily weather data we read and plotted above.

### Working with `daily` weather data

In [None]:
using Cropbox

In [None]:
@system Weather_daily begin
    calendar(context)             ~      ::Calendar
    t(calendar.date): date        ~ track::date

    data:                source_data     ~ provide(parameter, index = :date, init = t)
    Tavg:                avg_temperature ~ drive(from = data, by = :Tavg, u"°C")
    Tmin:                min_temperature ~ drive(from = data, by = :Tmin, u"°C")
    Tmax:                max_temperature ~ drive(from = data, by = :Tmax, u"°C")
    solrad:              solar_radiation ~ drive(from = data, by = :rad_tot, u"MJ/m^2/d^1")
    DLI:            daily_light_integral ~ drive(from = data, by = :DLI, u"mol/m^2/d^1")    
    T(Tavg):             temperature  => Tavg   ~ track(u"°C") 
    
end

`Weather_daily` is a mix-in system for loading up weather variables from an external data source. It relies on variables with kind of `provide` and `drive`. For handling time variable in calendar date format (YYYY-MM-DD), we also deploy a `Calendar` system included with Cropbox.

- `provide` *provides* a data frame with given index (`index`) starting from an initial value (`init`). Since we're going to provide a data frame using a configuration, the variable is tagged `parameter`.

- `drive` makes a *driving* variable from a data source (`from`) with a given column name (`by`). The data source is often supplied by `provide`.

- `Calendar` is like `Clock` embedded in `Context` system that provides `time` and `step` variables, but in the type of `ZonedDateTime` (`datetime` in Cropbox).

```julia
@system Calendar begin
    init ~ preserve::datetime(extern, parameter)
    last => nothing ~ preserve::datetime(extern, parameter, optional)
    time(t0=init, t=context.clock.time) => t0 + convert(Cropbox.Dates.Second, t) ~ track::datetime
    date(time) => Cropbox.Dates.Date(time) ~ track::date
    step(context.clock.step) ~ preserve(u"hr")
    stop(time, last) => begin
        isnothing(last) ? false : (time >= last)
    end ~ flag
    count(init, last, step) => begin
        if isnothing(last)
            nothing
        else
            # number of update!() required to reach `last` time
            (last - init) / step
        end
    end ~ preserve::int(round, optional)
end
```

We will use the same weather data from the 2002 corn experiment we've seen earlier. Recall that in this experiment seeds were planted on May 15, 2002 and plants were harvested in mid September. Also remember that we stored the data as a dataframe: `df1` and added a few more variables such as `rad_tot` and `DLI` .

We create a configuration that contain infomation and data needed to run dynamic simulations as the simulation clock moves through the time period defined in `Calendar` part of the configuration. 

In [None]:
# This block of code is from 'radiation' section to import and prepare weather data

using CSV, DataFrames 
# CSV package helps with reading and writing CSV (comma separated value) and other pplain text files

df1 = CSV.read("./data/MD_Beltsville_2002-daily.csv", DataFrame) |> unitfy
#unitfy(df1);

# Unit conversion from daily mean readiation to daily total radiation in MJ m^-2 d^-1. 
# Unitful takes care of this conversion.  
df1.rad_tot = df1.rad .|> u"MJ/m^2/d^1";

# Convert daily total solar radiation to DLI. Here we need to inclde spectrum specific information for the conversion.
# We assume 45% of radiation is PAR and PAR to PFD conversion factor is 4.6 umol/J.
df1.DLI = df1.rad_tot * 0.45*4.6u"μmol/J" .|> u"mol/m^2/d";

In [None]:
c1 = @config (
    :Calendar => (
        :init => ZonedDateTime(2002, 5, 15, tz"America/New_York"),
        :last => ZonedDateTime(2002, 9, 30, tz"America/New_York"),
    ),
    :Clock => (;
        :step => 1u"d",
    ),
    :Weather_daily => (;
        :data => df1, # provide 'df1' dataframe we worked with previously as source data
    ),
)

`Calendar` system embedded in `Weather_daily` above accepts `init` and `last` parameters in the type of `ZonedDateTime` for representing timestamps with proper time zone support. We need time zone because the default time resolution in Cropbox is by an hour and things can become tricky when it comes to handling daylight savings. To simplify our exercises, we will use daily time step for the most part of simulations here. `Calendar` also provides `stop` variable that taps on the interval between `init` and `last` to inform `simulate()` when simulation should be done.

In [None]:
@system WeatherViewer(Weather_daily, Controller)

We created a system called `WeatherViewer` with `Weather_daily` and `Controller` systems mixed in so that it can be instantiated. Let's create an instance of this system called `w` with `c1` configuration we defined.

In [None]:
w1 = instance(WeatherViewer, config = c1)

`w1` has a number of different variables inside it. The weather file we examined and stored as `df`' dataframe previously is now embeded in this instance of `Weather_daily` mixin system as `data`. Let's take a look at the embeded weather data here just for the first 10 records.

In [None]:
w1.data.value[1:10, :];

We can run a `simulation` of this weather data. There's really nothing to run here but a key difference between this step and just viewing the data above is that the `Clock` is going through `data` step by step dynamically according to `Calendar` instructions. We will save the output as dataframe called `s1` that contains `t` and `T`, `DLI`, and `solrad` (not 'rad_tot').

In [None]:
s1=simulate(WeatherViewer;
    config = c1,
    stop = "calendar.stop",
#    snap = u"7d",
    index = :t,
    target = [:T, :solrad, :DLI],
);

In [None]:
plot(s1, :t, :T; kind = :line);

#### Do something more?  ####
While it is nice to be able to run something like this, this system doesn't really do much other than containing the weather data for each time step. Let's create a system that does a little more to accumulate data for weather variables of interest over time. This is a useful functionality in plant growth modeling becuase plants respond to cumulative temperatures for their development or to cumulative light for their growth. The growth of plants is an integrative response of environmental conditions the plants experience. These relationships are useful especially when they are linear and form the basis of growing degree days (GDD) and light (or radiation) use efficiency (LUE or RUE) models we just saw.

Let's create a new system called `WeatherSum` that will calculate and store sums of weather variables over a period. We will calculate growing degree days with base temperature ($T_b$) of 10 °C and optimum temperature of ($T_o$) of 30°C.  We will also calculate cumulative solar radiation during the simulation period. 

In [None]:
@system WeatherSum(WeatherViewer) begin   
    GDD(T): growing_degree_day_K => begin
        (min(T,30.0u"°C") - 10.0u"°C") / 1u"d"
    end ~ track(min = 0, u"K/d")

    cGDD(GDD):                         cumulative_GDD  ~ accumulate(u"K")
    solrad_sum(solrad):        sum_of_solar_radiation  ~ accumulate(u"MJ/m^2")
    GDD_rating:        maturity_growing_degree_rating => 1500  ~ preserve(u"K", parameter)
end

Note here that we are converting temperature in Celcius (C) to Kelvin (K) degrees. The reason is that `Unitful` package doesn't allow temperature calculations in C or F but requires to convert C to K to get temperature differences. See <https://painterqubits.github.io/Unitful.jl/stable/#Usage-examples>.

But wait, didn't we create a system to calcuate GDD and cGDD already in `@sec-temperature` unit? Yes, we did. So we can just reuse the system as `mixin` here. 

In [None]:
"Growing degree days and killing degree days calculator"
@system DegreeDays begin
    T:  temperature                              ~ preserve(parameter, u"°C")
    Tb: base_temperature                         ~ preserve(parameter, u"°C")
    To: optimal_temperature                      ~ preserve(parameter, u"°C")
    GDD_rating: maturity_growing_degree_rating   ~ preserve(parameter, u"K")
    
    GD(T, Tb, To): growing_degree => begin
        min(T, To) - Tb
    end ~ track(min = 0, u"K")

    GDD(GD): growing_degree_day => begin
        GD / 1u"d"
    end ~ track(u"K/d")

    cGDD(GDD): cumulative_growing_degree_day ~ accumulate(u"K")

    KD(T, To): killing_degree => begin
        T - To
    end ~ track(min = 0, u"K")

    KDD(KD): killing_degree_day => begin
        KD / 1u"d"
    end ~ track(u"K/d")

    cKDD(KDD): cumulative_killing_degree_day ~ accumulate(u"K")
end

Here's the configuration we made with 10C as base temperature and 30C as optimal temperature for a corn hybrid with the GDD rating of 1500 degree days.

In [None]:
# prepare a configuration to simulate and visualize
dd_config = @config (
    Clock => (;
        step = 1u"d",
    ),
    DegreeDays => (;
        Tb = 10.0,
        To = 30.0,
        GDD_rating = 1500,
    ),
)

Then make sure that `GrowingDegreeDay` system is mixed in first before `WeatherViewer` which includes `Controller` the last mixin.

In [None]:
@system WeatherSum(DegreeDays, WeatherViewer) begin
    solrad_sum(solrad):        sum_of_solar_radiation  ~ accumulate(u"MJ/m^2")
end

If no errors, run the simulation.

In [None]:
# different ways to combine configurations
#s1=simulate(WeatherSum; config = @config(c1 + gdd_config), stop = "calendar.stop")
s1=simulate(WeatherSum; config =(c1, dd_config), stop = "calendar.stop");

::: {#exr-corn-gdd}
### Determining corn harvest date at maturity based on GDD rating
As mentioned earlier, this weather data set is from a research field in Beltsville, MD where corn was growing in 2002. Commercial corn hybrids are labeled with their maturity rating to indicate their harvest timing. Early maturity hybrids have lower values and late maturity hybrids have higher values. Assuming the corn cultivar growin in this field had the maturity rating of 1500K (~ 2700F) to reach [black layer stage](https://www.pioneer.com/us/agronomy/kernel-black-layer-formation.html) with based temperature of 10 $^\circ$C and optimal temperature of 30 $^\circ$C. Based on this information, estimate the date at which this corn cultivar would be ready for harvest in this field.
:::

In [None]:
plot(s1, :t, [:cGDD, :GDD_rating]; kind = :line);

In [None]:
# Look at the simulation results closely to find when exactly cGDD meets GDD_rating to determine harvest timing
filter(:cGDD => x -> (1450u"K" < x < 1550u"K"), s1); 

#### How reasonable are our modeling results?
We probably should check if our results are indeed resonable or at least get an idea of how bad or off they are. We can do this by comparing our results with that of a crop decision support tool  prepared for growers in the corn belt at <https://hprcc.unl.edu/agroclimate/cligrow/#>. Here we assume that we used a hybrid with 111 days to maturity that matches the maturity rating of the hybrid that we used in our exercises.

### Working with `hourly` weather data ###
In `Cropbox` the default time step is hourly. That is because the physiological processes vary dynamically over a day. Capturing these dynamics is important because physiological and biochemical responses to light, temperature, and other environmental variables are often non-linear. Plant growth models running on daily or coarser time step oftem assume linear responses of plant growth and development to light and temperature. We have seen the examples in RUE and GDD models previously. The assumed linear relationships are likely to break in extreme weather conditions (e.g., high emperatures causing heat stress, saturating light). On the other hand, a downside of plant growth models at hourly or finer intervals is that they impose a greater computation load and take longer to run. This is particularly true for tree or forest growth models that run from decades to centurires in hourly time step. Another consideration is the availability and quality of weather data for a desired location and frequency. More oftren than not, weather stations have missing data for sensor failure and other reasons and the missing values need to be gap filled before they can be used for running plant growth models.

There is a weather station at UW Center for Urban Horticulture (CUH) that is maintained by the AgWeatherNet (<https://weather.wsu.edu/>). We will work with hourly weather data (file: UW-CUH_2014-2021.csv) downloaded from the weather station. This dataset includes weather data from 2014 to 2021 formatted to include weather data collected for 8 years from 2014 to 2021 with varibles inlcuded in the `Weather_hourly` system we create later.

First, some housekeeping. The cell below includes code to take care of datetime formatting, indexing, and duplicate times due to day light saving in the US locations. It includes a function called `loadwea` to load a CSV (comma separated values) file into a dataframe. We will call this function to read a weather file in CSV (or other delimited) format to `provide` a dataframe that can be used to `drive` plant growth modeling. Beside this fact, we can ignore other details in this cell.

In [None]:
using Dates
using TimeZones

datetime_from_julian_day_WEA(year, jday, time::Time, tz::TimeZone, occurrence) =
    zoned_datetime(Date(year) + (Day(jday) - Day(1)) + time, tz, occurrence)
datetime_from_julian_day_WEA(year, jday, tz::TimeZone) = datetime_from_julian_day_WEA(year, jday, "00:00", tz)
zoned_datetime(dt::DateTime, tz::TimeZone, occurrence=1) = ZonedDateTime(dt, tz)
zoned_datetime(dt::DateTime, tz::VariableTimeZone, occurrence=1) = ZonedDateTime(dt, tz, occurrence)

using CSV
using DataFrames: DataFrames, DataFrame

loadwea(filename, timezone; indexkey=:index) = begin
    df = CSV.File(filename) |> DataFrame
    df[!, indexkey] = map(r -> begin
        occurrence = 1
    # We will check if a time is duplicated, possibly due to day light saving and flag with occurrence of 2     
        i = DataFrames.row(r)
        if i > 1
            r0 = parent(r)[i-1, :]
            r0.time == r.time && (occurrence = 2) 
        end
        datetime_from_julian_day_WEA(r.year, r.jday, r.time, timezone, occurrence)
    end, eachrow(df))
    df
end

In [None]:
# open weather data using 'loadwea' function we just defined to import it as dataframe named 'df2'
tz = tz"America/Los_Angeles"
df2 = loadwea("./data/UW-CUH_2014-2021.csv", tz);

In [None]:
#This cell chops the CUH weather data by year and save them as separaate files.
# for yr in 2014:2021
#         d0=filter([:year] => x -> x == yr, df2)
#         d0 = (d0[!,[:year, :jday, :time, :Tair, :RH, :Wind, :SolRad, :Rain]])    
#         CSV.write("./UW-CUH_" * string(yr) * ".csv", d0, overwrite=true)
# end

We filter year 2021 data to work with as an example. 

In [None]:
df2021 = filter([:year] => x -> x == 2021, df2);

In [None]:
@system Weather_hourly begin   
    calendar(context)                       ~ ::Calendar
    t(calendar.time): datetime        ~ track::datetime

#    data ~ provide(init= calendar.time, parameter)
    data ~ provide(init= t, index = :index, parameter)

    solrad:   solar_radiation ~ drive(from=data, by=:SolRad, u"W/m^2")
    RH:     relative_humidity ~ drive(from=data, by=:RH, u"percent")   
    T:            temperature ~ drive(from=data, by=:Tair, u"°C")
    Wind:          wind_speed ~ drive(from=data, by=:Wind, u"m/s")
    Rain:            rainfall ~ drive(from=data, by=:Rain, u"mm")    
end    

We define `WeatherViewer2` as mixin system of `Weather_hourly` and `Controller`. 

In [None]:
@system WeatherViewer_hr(Weather_hourly, Controller) begin
end

In [None]:
@look WeatherViewer_hr

Let's create a new configuration `c2` for 2021 CUH hourly data.

In [None]:
c2 = @config (
    :Calendar => (
        :init => ZonedDateTime(2021, 1, 1, tz"America/Los_Angeles"),
        :last => ZonedDateTime(2021, 12, 31, tz"America/Los_Angeles"),
        ),
    :Weather_hourly => (;
        :data => df2021,
    ),    
    :Clock => (;
        :step => 1u"hr",
     ),
)

Create an instance of `WeatherViewer` system called `w2` using `c2` configuration

In [None]:
s2 = simulate(WeatherViewer_hr, config = c2, stop = "calendar.stop");

In [None]:
#| output: false
# change from :line to :scatter
Cropbox.plot(s2, :t, :solrad; kind = :line);

In [None]:
visualize(WeatherViewer_hr, "calendar.time", :solrad;
    config = c2,
    stop = "calendar.stop",
    kind = :line,
    xlim = (Date(2021,7,1), Date(2021,7,7)),
);

In [None]:
@system WeatherSum_hr(DegreeDays, WeatherViewer_hr) begin   
    solrad_sum(solrad):        sum_of_solar_radiation  ~ accumulate(u"MJ/m^2")
end

In [None]:
@look WeatherSum_hr

In [None]:
c3 = @config (c2,
    :Calendar => (
        :init => ZonedDateTime(2021, 5, 15, tz"America/Los_Angeles"),
        :last => ZonedDateTime(2021, 9, 15, tz"America/Los_Angeles"),
        ),        
    :WeatherSum_hr => (
        Tb = 10.0, # 50F
        To = 30.0, # 86F
        GDD_rating = 1500, # Black Layer GDD in F: 2745 with a common corn cultivar with days to maturity rating of 113 days in Mid-West 
                           # See: https://hprcc.unl.edu/agroclimate/cligrow/#
        ),
)

In [None]:
s3 = simulate(WeatherSum_hr; config = c3, stop = "calendar.stop");

::: {#exr-seattle-weather}
### Plot Seattle weather {.unnumbered}
Explore how different climate factors (e.g., solar radiation, temperature, preciptation, wind, etc) changes in Seattle over time (e.g., in different time period, seasons, and years) and  plot them. Do this on your own in class or elsewhere. 
:::

## Homework Problems

### Seattle corns
Can we grow feed corn in Seattle? Application of a growing degree day model

If you are a gardener, you might have heard peple saying we don't have enough heat units in Seattle to grow crops like tomato. Is it really true? Curiously, we don't see a lot of feed corns growing in Puget Sound. Aside from the land price being so high, let's check if our climate is conducive for growing feed corns. In @exr-corn-gdd we ran the growing degree  model with daily weather data from Beltsville, MD for a corn hybrid with maturity rating of 1500K,  base temperature of 10.0 $^\circ$C, and optimal temperature of 30.0 $^\circ$C. Estimate the date at which this corn cultivar would be ready for harvest if it were grown at CUH in 2021 if planted on May 15 and discuss your findings. Use the same cultivar characteristics to work on this problem and run the GDD model on `Seattle hourly weather data`. 

a. Evaluate whether and when the commercial corn hybrid we used in @exr-corn-gdd will mature to black layer stage if it were grown under Seattle weather in 2021. 

b. Pick another year of your chose from the Seattle weather data and do the same as part 1.

c. Discuss if and how the Seattle climate is conducive for growing feed corn for production. Provide plots and other modeling ouputs that you can use to support your points. 

### RUE corn model
Model corn growth in Seattle based on a RUE model running on hourly weather data

We can now estimate plant growth using the RUE model we used in @exr-corn-rue. Recall that for maize as a C4 crop, the RUE value is esimated to be 1.7 g of biomass/MJ of solar radiation ([Sinclair and Horie T, 1989](https://doi.org/10.2135/cropsci1989.0011183X002900010023x)).

a. Estimate the total biomass accumulated per unit area ($\mathrm{g\ m^{-2}}$) of this maize cultivar grown in Seattle weahter in 2021 usng the `hourly weather` dataset and harvested on September 15, 2021. Assume that all other conditions and charateristics were the same as in @exr-corn-rue. 
b. Planting density was 8 plants per $\mathrm{m^{2}}$. What is the final biomass per plant?
c. Does the estimate for biomass accumulation look reasonable to you? Will this corn cultivar reach its maturity to produce harvestable yield? What other factors do you think are important for answering this question but missing from the information provided?

### GDD and RUE model comparisons
Were there enough heat sums for the corn crop to reach maturity as indicated in the GDD rating? Where there enough radiation sums for corn plants accumulate sufficient biomass? How would you compare and explain the matcy or mismatch between the RUE based growth and GDD based developmen of corn plants as predited by these two modeling apporaches for Seattle weather? How would you improve these models? 

To check our results are indeed resonable, we can compare with that of a crop decision support tool  prepared for growers in the corn belt at <https://hprcc.unl.edu/agroclimate/cligrow/#> using 111 days to maturity to match maturity rating of the hybrid that we used in our exercises.
