Skip to content

willmorrison1/daRt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

daRt

The daRt package provides a very quick and flexible way to import data that is produced by the Discrete Anisotropic Radiative Transfer (DART) model (CESBIO). The data in daRt are formatted in a way that facilitates rapid data analysis. Formal documentation is available in the pdf manual.

Installation

You can install the development version from GitHub with:

# install.packages("remotes")
# devtools::install_github doesn't like my github package dependencies 
remotes::install_github("willmorrison1/daRt")

Load the package

library(daRt)
#> Warning: package 'data.table' was built under R version 3.6.3
#> Warning: package 'foreach' was built under R version 3.6.3
#> Warning: package 'doParallel' was built under R version 3.6.3
#> Warning: package 'iterators' was built under R version 3.6.3
#> Warning: package 'shadowtext' was built under R version 3.6.3
#> Warning: package 'fields' was built under R version 3.6.3
#> Warning: package 'spam' was built under R version 3.6.3
#> Warning: package 'dotCall64' was built under R version 3.6.3
#> Warning: package 'maps' was built under R version 3.6.3
#> Warning: package 'chron' was built under R version 3.6.3
#> Warning: package 'raster' was built under R version 3.6.3
#> Warning: package 'sp' was built under R version 3.6.3

Overview

This section demonstrates the most basic use of daRt to load the “directions” data product for the default “cesbio” simulation provided in this respository.

Define a simulation directory

simulationDir <- "man/data/cesbio"

simulationFilter() determines the type of files you want to load (here with defaults).

sF <- daRt::simulationFilter(product = "directions")

getData() loads data for the given simulation using the predetermined file type(s).

simData <- daRt::getData(x = simulationDir, sF = sF)

as.data.frame() releases the data object as a “long” format data frame.

DF <- as.data.frame(simData)
head(DF, n = 3)
#> # A tibble: 3 x 8
#> # Groups:   band, iter, typeNum, simName [1]
#>   zenith azimuth value  band variable iter  typeNum simName
#>    <dbl>   <dbl> <dbl> <int> <chr>    <chr> <chr>   <chr>  
#> 1    0         0 0.646     0 BRF      ITER1 ""      cesbio 
#> 2   22.4      30 0.612     0 BRF      ITER1 ""      cesbio 
#> 3   22.4      90 0.598     0 BRF      ITER1 ""      cesbio

Define data to load: SimulationFilter

The ‘SimulationFilter’ object describes what data you want to extract from a DART output directory structure. Show the current configuration of the SimulationFilter

sF
#> 'SimulationFilter' object for DART product: directions 
#> 
#> bands:          0, 1 
#> variables:      BRF 
#> iterations:     ITER1, ITER2 
#> variablesRB3D:  Intercepted, Scattered, Emitted, Absorbed, +ZFaceExit, +ZFaceEntry 
#> typeNums:        
#> imageTypes:      ima, camera 
#> imageNums:

List the ‘setter’ and ‘accessor’ methods available

methods(class = "SimulationFilter")
#>  [1] bands              bands<-            getData            getFiles          
#>  [5] imageFiles         imageNums          imageNums<-        imageTypes        
#>  [9] imageTypes<-       iters              iters<-            product           
#> [13] product<-          show               simulationFilter<- subDir            
#> [17] typeNums           typeNums<-         variables          variables<-       
#> [21] variablesRB3D      variablesRB3D<-   
#> see '?methods' for accessing help and source code

Use these methods to edit the SimulationFilter object e.g. the bands or iters (iterations) that you want to load

bands(sF) <- 0:2
iters(sF) <- "ITER3"

Browse files: SimulationFiles

The ‘SimulationFiles’ object contains all information on the files that will be loaded, based on the provided SimulationFilter. It is used to explore the DART output directory structure. First define the simulation directory. For this example, simulationDir is a relative directory (based on the github data provided) and consists of one simulation.

#define the simulation directory
simulationDir <- "man/data/cesbio"

If you install the package using remotes::install_github then the “cesbio” simulation files will not be available automatically. To use these files, get them from github manually or use your own ‘cesbio’ simulation which is shipped with the DART model by default.

The simulation directory should be the base directory of the simulation. E.g. within simulationDir there should be the simulation ‘input’ and ‘output’ directories.

list.files(simulationDir)
#> [1] "input"  "output"

Now we have the simulation directory clarified, explore the files in the simulation that correspond to this filter

simFiles <- daRt::getFiles(x = simulationDir, sF = sF)

Explore the output of this to check we happy to continue and load the data. getFiles() is essentially a ‘dry-run’ of the data extraction

dataFiles <- fileName(simFiles)
all(file.exists(dataFiles))
#> [1] TRUE

Load data: SimulationData

The SimulationData object contains all data for the given SimulationFilter. Do the following to extract DART output data using the getData() method

simData <- daRt::getData(x = simulationDir, sF = sF)
#also can do this using simFiles object
simData_fromFiles <- daRt::getData(x = simFiles)
identical(simData_fromFiles, simData)
#> [1] TRUE

Analysis: as.data.frame()

By having data in a “long” format, it is easy to perform analysis on the data. Once you are ready to use the data, retrieve it using as.data.frame().

#plot using ggplot2
library(ggplot2)
DFdata <- as.data.frame(simData)
plotOut <- ggplot(DFdata) +
  geom_point(aes(x = zenith, y = value, colour = azimuth)) +
  facet_wrap(~ band) +
  theme(aspect.ratio = 1)
plot(plotOut)

Expanded examples

This section provides further examples of package use.

Defining data to load: SimulationFilter for images

To look at images for bands 0, 1 and 2; iters (iterations) 1 and 2, and imageNums (image numbers) 5 and 7, create the relevant SimulationFilter then load the data

#create SimulationFilter
sF <- daRt::simulationFilter(product = "images", 
                             bands = as.integer(0:2),
                             iters = c("ITER1", "ITER2"),
                             variables = "BRF",
                             imageNums = as.integer(c(5, 7)),
                             imageTypes = "ima") 
#load data - 'nCores' allows parallel processing of files.
#It is useful for access to drives that have optimised paralell I/O.
#here load data using 2 cores.
simData <- daRt::getData(x = simulationDir, sF = sF, nCores = 2)
#simple plot of data
ggplot(simData %>% as.data.frame()) + 
  geom_raster(aes(x = x, y = y, fill = value)) +
  facet_grid(band ~ imageNum + iter) +
  theme(aspect.ratio = 1) 

Working with output data: radiative budget

Alter the SimulationFilter again to now look at files for the radiative budget product.

product(sF) <- "rb3D"
simData <- daRt::getData(x = simulationDir, sF = sF, nCores = 2)
#> Warning in filesFun(x = x[i], sF = sF): Product is 'rb3D'. Forcing
#> 'RADIATIVE_BUDGET' variable in 'simulationFilter' variables.

The 3D radiative budget data are stored with the X, Y and Z location of each cell (conforming to DART coordinate system i.e. "the part of the scene that horizontally is ‘top left’ and vertically is at the bottom is: X = 1, Y = 1, Z = 1), stored in 3 columns.

head(as.data.frame(simData), n = 3)
#> # A tibble: 3 x 9
#> # Groups:   band, iter, typeNum, simName [1]
#>       X     Y     Z value variableRB3D  band iter  typeNum simName
#>   <int> <int> <int> <dbl> <chr>        <int> <chr> <chr>   <chr>  
#> 1     1     1     1  1.01 Intercepted      0 ITER1 ""      cesbio 
#> 2     2     1     1  1.02 Intercepted      0 ITER1 ""      cesbio 
#> 3     3     1     1  1.01 Intercepted      0 ITER1 ""      cesbio

The below example uses “dplyr” to work with this data. Here we look at the lowest horizontal layer of each 3D radiative budget array (i.e. Z = 1).

library(dplyr)

#filter lowest horizontal cross section of the radiative budget
simData_filtered <- simData %>%
  as.data.frame() %>%
  dplyr::filter(Z == 1)

ggplot(simData_filtered) + 
  geom_raster(aes(x = X, y = Y, fill = value)) +
  facet_grid(band ~ variableRB3D) +
  theme_bw() +
  theme(panel.spacing = unit(0, "cm"), 
        strip.text = element_text(size = 6, 
                                  margin = margin(0.05, 0.05, 0.05, 0.05, unit = "cm"))) +
  scale_fill_distiller(palette = "Spectral") +
  theme(aspect.ratio = 1)

Extract simulation meta data

wavelengths(simData)
#>   simName band lambdamin lambdamid lambdamax equivalentWavelength
#> 1  cesbio    0    0.6200    0.6850    0.7500                   NA
#> 2  cesbio    1    0.2500    0.5785    0.9070                   NA
#> 3  cesbio    2    0.4495    0.4720    0.4945                   NA

sunAngles(simData)
#>   simName   sunPhi sunTheta
#> 1  cesbio 136.3097  132.839

versionInfo(simData)
#>   version buildFull buildNumber
#> 1   5.7.4     v1091        1091

resourceUse(simData)
#> Warning in searchDartTxtVal(rawFileDATA, searchQuote = "Processing time"): Could
#> not get ' Processing time ' info from dart.txt
#> Warning in searchDartTxtVal(rawFileDATA, searchQuote = "Memory usage"): Could
#> not get ' Memory usage ' info from dart.txt
#>   simName timeTaken memUsage
#> 1  cesbio        NA       NA

Miscellaneous

Memory management: tips

Loading many files/variables may require memory management. getData() loads all requested data to memory which can be problematic for large files (e.g. Radiative Budget). It is assumed that the user will perform some analysis on subsets of the raw data in a way that reduces the overall memory footprint. To demonstrate memory management, files in this section are loaded in two different scenarios: scenario 1 uses the default getData() to load and then analyse all data at once. Scenario 2 loads and analyses the data in pieces, which has a much smaller memory footprint (but may be slower). Both scenarios give the same result.

Scenario 1: Load data all at once

Load all radiative budget products at once into memory and take the mean of each horizontal layer.

sF <- daRt::simulationFilter(product = "rb3D", 
                             bands = as.integer(0:2), 
                             iters = c("ITER1", "ITER2", "ILLUDIFF", "ILLUDIR"),
                             typeNums = "",
                             variables = "RADIATIVE_BUDGET")
simFiles <- daRt::getFiles(simulationDir, sF = sF)

There are twelve files each with 6 variables and each as a 3D array - i.e. quite a lot of data. Load in the data all at once.

simData <- daRt::getData(x = simFiles, nCores = 2)

and gives a relatively large array of data

DFdata <- as.data.frame(simData)
head(DFdata, n = 3)
#> # A tibble: 3 x 9
#> # Groups:   band, iter, typeNum, simName [1]
#>       X     Y     Z value variableRB3D  band iter  typeNum simName
#>   <int> <int> <int> <dbl> <chr>        <int> <chr> <chr>   <chr>  
#> 1     1     1     1  1.01 Intercepted      0 ITER1 ""      cesbio 
#> 2     2     1     1  1.02 Intercepted      0 ITER1 ""      cesbio 
#> 3     3     1     1  1.01 Intercepted      0 ITER1 ""      cesbio
dim(DFdata)
#> [1] 784080      9

Do some analysis on the data. Get the mean of non-zero values across each vertical layer of each variablesRB3D, bands, iters (already grouped) according to the above column names

statVals <- DFdata %>%
  dplyr::group_by(X, Y, variableRB3D, add = TRUE) %>%
  dplyr::summarise(meanVal = mean(value[value != 0], na.rm = TRUE))

Scenario 2: Load data in sections and process each section

Do ‘scenario 1’ analysis but with data processed for each band separately to save on memory usage.

sF <- daRt::simulationFilter(product = "rb3D", 
                             bands = as.integer(0:2), 
                             iters = c("ITER1", "ITER2", "ILLUDIFF", "ILLUDIR"),
                             typeNums = "",
                             variables = "RADIATIVE_BUDGET")
allBands <- bands(simData)
allBands
#> [1] 0 1 2
simDataList <- vector(mode = "list", length = length(allBands))
for (i in 1:length(allBands)) {
  bands(sF) <- allBands[i]
  simDataPiece  <- daRt::getData(x = simulationDir, sF = sF, nCores = 2)
  simDataList[[i]] <- simDataPiece %>%
    as.data.frame() %>%
    dplyr::group_by(X, Y, variableRB3D, add = TRUE) %>%
    dplyr::summarise(meanVal = mean(value[value != 0], na.rm = TRUE))
}

Now put together the list of data. As each list element is a summary of the raw data, it has a much smaller memory footprint. As the summary was performed on one band at a time, the amount of data loaded at once is less than if getData() was executed for all bands at once (scenario 1). By loading one band at a time as opposed to all three at once, the memory footprint is around 1/3 of scenario 1.

simDataDF <- dplyr::bind_rows(simDataList)

statVals1 <- simDataDF

Both scenarios give the same results

all.equal(statVals, statVals1)
#> [1] TRUE

but by processing in parts, the latter (scenario 2) - produced by ‘statVals1’ - has a smaller memory footprint as the stats are calculated for each band separately. When inter-band stats are required, the example can be adapted to iterate over e.g. iters or variablesRB3D.

Radiative budget files: compression

DART radiative budget files are raw binary and can get very large. rb3DtoNc converts .bin to NetCDF (.nc) format, which gives smaller files sizes and can be compressed.

Get some DART radiative budget binary data (the default data)

simulationDir <- "man/data/cesbio"
sF <- daRt::simulationFilter(product = "rb3D",
                             bands = as.integer(1), 
                             iters = "ITER1",
                             typeNums = "",
                             variables = "RADIATIVE_BUDGET")
simFiles_bin <- daRt::getFiles(simulationDir, sF = sF)
simData_bin <- as.data.frame(daRt::getData(simFiles_bin, nCores = 2))
#get the file size - for later comparison
fileSize_bin <- file.size(fileName(simFiles_bin))

Convert the .bin data to .nc. The .bin file will be deleted by rb3DtoNc.

simFiles_nc <- daRt::rb3DtoNc(simFiles_bin)
simData_nc <- as.data.frame(daRt::getData(simFiles_nc, nCores = 2))

There are some very minor differences in the two products - likely due to the ncdf compression algorithm and/or rounding.

max(abs(simData_nc$value - simData_bin$value))
#> [1] 9.187445e-08

The new .nc file is much smaller:

fileSize_nc <- file.size(fileName(simFiles_nc))
fileSize_nc / fileSize_bin
#> [1] 0.127663

and is much faster to read. It can also be read by third party NetCDF browsers e.g. ncview.

Unwanted files: deletion

DART can output many unwanted files. Use deleteFiles() to delete files based on a provided SimulationFiles object. Here delete all directions files from iters 1 and bands 1.

Define the file for deletion

sF <- daRt::simulationFilter(product = "directions", 
                             bands = 1L, 
                             iters = "ITER1")
filesToDelete <- daRt::getFiles(x = simulationDir, sF = sF)

then delete the file, where deleteSimulationFiles parameter makes the user sure to know that they are deleting output data!

deleteFiles(x = filesToDelete, deleteSimulationFiles = TRUE)
#> NULL

Parallelisation issues

Under windows especially, there may be some issues with paralellisation. When running getData() you may get “invalid connection” error. I may have sorted the issue but if it reappears then this function seems to help:

unregister <- function() {
  env <- foreach:::.foreachGlobals
  rm(list = ls(name = env), pos = env)
}

Releases

No releases published

Packages

 
 
 

Languages