In [1]:
import pandas as pd
import xarray as xr
import numpy as np
import rioxarray as rio
from glob import glob
from utils import *
from geocube.api.core import make_geocube
from typing import Union
from types import NoneType
from sklearn.linear_model import LinearRegression
from rasterio.enums import Resampling
from tqdm.contrib import itertools
import os

# Calculating the fraction of forest biomass out of global biomass

## 1. Introduction
This procedure is intended to split the spatially-explicit biomass estimates into the forest fraction and the non-forest fractions. The reason a split into forest and non-forest biomass is needed is to incorporate into our final estimate some of the estimates that we have (namely Pan et al. and the FRA) that cover only on forest biomass. We thus use the FRA definition of a forest which is an area greater than 0.5 ha that has more than 10% coverage of trees that are above 5 meters tall.


The two main ingredients we need in order to split the biomass maps into forest and non-forest biomass are:

1. Maps of forest cover 

2. Estimates of the biomass density of forest and non-forest lands. 

### 1.1. Maps of forest cover
For the forest cover maps, we use the ESA CCI land cover maps. We relay on these maps because they have annual data with global coverage at relative high resolution (300m). We map the native ESA CCI land cover types into forest and non-forest land covers (shurbland, cropland/grassland, and bare ground) based on Table S1 in [Tagesson et al. 2020](https://doi.org/10.1038/s41559-019-1090-0). 



### 1.2. Estimate forest and nonforest biomass density

We generate several estimates of the biomass density in forests and nonforests in descrete regions around the world. Generally the estimates are based on either of two approaches:
1. Estimates from the literature
2. A spatial regression approach

For the literature estimates, we use the estimates from [Xu et al. 2021](https://www.science.org/doi/10.1126/sciadv.abe9829) and the regions defined therein.

For the regression method, we rely on the two following datasets and use [WWF regions](https://en.wikipedia.org/wiki/List_of_terrestrial_ecoregions_(WWF)) as regions: 
1. [Song et al. 2016](https://www.nature.com/articles/s41586-018-0411-9)
2. ESA CCI

### 1.3. Estimating forest biomass fraction

All of our biomass data sources the cover both forest and non-forest biomass are spatially explicit except [Besnard et al.](https://onlinelibrary.wiley.com/doi/full/10.1111/gcb.15877), which reports values for Global Fire Emissions Database (GFED) regions.

#### Spatially explicit biomass estimates

For each pixel for each year in each biomass data source, we calculate the fraction of the biomass found in forests. To do that, we use the following equation:

$$ F(t)_i^{forest} = \frac{A(t)_i^{forest} \times D(t)_i^{forest}}{A(t)_i^{forest} \times D(t)_i^{forest} +A(t)_i^{nonforest} \times D(t)_i^{nonforest}} $$

where $F(t)_i^{forest}$ is the fraction of the biomass in pixel i found in forests at time t, $A(t)_i^{forest}$ is the area of forest in pixel i at time t, $D(t)_i^{forest}$ is the biomass density of forest in pixel i at time t, $A(t)_i^{nonforest}$ is the area of non-forest in pixel i at time t, and $D(t)_i^{nonforest}$ is the biomass density of non-forest in pixel i at time t.

As stated above the source for $A(t)_i^{forest}$ is the ESA CCI land cover maps, and $A(t)_i^{nonforest}$ is the difference between the total pixel area and $A(t)_i^{forest}$.

$D(t)_i^{forest}$ and $D(t)_i^{nonforest}$ are determined based on descrete regions, and are determined either from [Xu et al. 2021](https://www.science.org/doi/10.1126/sciadv.abe9829) or the regression estimates.

For the regression approach, we estimate the forest and nonforest biomass densities in each region for each one of the biomass data sources by pooling all of the pixels for the data sources in the region, and using a linear regression model in which we regress the total biomass density of each pixel (in units of $MgC\ ha^{-1}$) as a function forest and nonforest area fractions and the regional forest and nonforest biomass densities. Our regression follows the following equation:
$$B_i(t) = F(t)_i^{forest} \times D(t)_r^{forest} + F(t)_i^{nonforest} \times D(t)_r^{nonforest} + \epsilon_i$$

where $B_i(t)$ is the biomass density of pixel i at time t, $F(t)_i^{forest}$ and $F(t)_i^{nonforest}$ are the fractions of forest and nonforest area in pixel i at time t, $D(t)_r^{forest}$ and $D(t)_r^{nonforest}$ are the forest and nonforest biomass densities in region r at time t, and $\epsilon_i$ is the error term.

As stated above, the data sources we use for $F(t)_i^{forest}$ and $F(t)_i^{nonforest}$ are the ESA CCI land cover dataset and [Song et al. 2016](https://www.nature.com/articles/s41586-018-0411-9) and the regions are defined by the WWF ecoregions.

#### Non-spatially explicit biomass estimates

The [Besnard et al.](https://onlinelibrary.wiley.com/doi/full/10.1111/gcb.15877) dataset does not report spatially-resolved maps of global biomass for each year. To split we rely #TODO

## 2. Define functions for the analysis

### 2.1. For gridded data sources

We define four functions used for the analysis:

1. `split_biomass`: This function splits the biomass data into forest and non-forest biomass based on the area fraction of each type and the biomass densities of each type. To calculate the biomass densities of each type, it calls the `get_biomass_density` function.

2. `get_biomass_density`: This function calculates the biomass densities of the different landcover types for each region. It can perform this calculation in three different methods as stated above. For the regression-based approaches, it calls the `regress_density` function.

3. `regress_density`: This function performs the regression analysis to estimate the biomass densities of the different landcover types for each region. It iterates over each region and calls the `regress_density_region` function.

4. `regress_density_region`: This function performs the regression analysis to estimate the biomass densities of the different landcover types for a single region.

In [2]:
def regress_density_region(ds:xr.DataArray) -> xr.DataArray:
    '''
    This function calculates the coefficients of Linear Regression for Land Cover Biomass against Land Cover categories.

    Parameters:
    ds: xarray.DataArray
        The DataArray containing the landcover and biomass data.

    Returns:
    xarray.DataArray
        The coefficients of Linear Regression for Land Cover Biomass against Land Cover categories.
    '''

    # reshape the input data into a 2D array where rows represent each sample and columns represent individual features.
    X = ds.values.reshape((ds.shape[0],-1))

    # extract and flatten the 'biomass' column from the input data, and remove NaN values.
    y = ds['biomass'].values.flatten()
    mask = (~np.isnan(X.sum(axis=0))) & ~np.isnan(y).squeeze() # Squeeze the boolean value into a scalar.
    
    # mask out columns from X that have one or more NaN values, update output variables.
    X = X[:,mask]
    y = y[mask]

    # if the filtered inputs have an N of more than 500 valid features, run Linear Regression to estimate coefficients, else return an empty data array.
    if X.shape[1]>=500:
        # implement Linear Regression with positive constraint on regression weights using the filtered inputs.
        reg = LinearRegression(positive=True,fit_intercept=False).fit(X.T, y)
    
        # return a DataArray containing the learned coefficients with dimensions 'landcover'.
        return xr.DataArray(reg.coef_,dims=['landcover'])
    else:
        # create and return a DataArray with shape 'ds.shape[0]' and all elements initialized with NaN values.
        return xr.DataArray(np.full(ds.shape[0],np.nan),dims=['landcover'])


def regress_density(biomass:xr.DataArray, area_map:xr.DataArray,regions:xr.DataArray) -> np.ndarray:
    '''
    This function calculates the biomass densities for each region based on the regression method.

    Parameters:
    biomass: xarray.DataArray
        The biomass data to be split.
    area_map: xarray.DataArray
        The area map used to define the landcover types.
    regions: xarray.DataArray
        The regions used to define different biomass densities for each landcover.
    
    Returns:
    numpy.ndarray
        The biomass densities for each region for the different landcovers.
    '''

    # create a merged DataArray with the landcover map, the biomass map and the region map
    merged_ds = area_map
    merged_ds['biomass'] = biomass
    merged_ds['area_id'] = regions

    # group the DataArray by region and apply the regression function
    density = merged_ds.groupby('area_id').apply(regress_density_region)

    # convert the result into an ndarray
    density = density.to_dataframe(name='').unstack().T.droplevel(0).reindex(list(range(0,int(merged_ds['area_id'].max().values)+1)),fill_value=np.nan)
    
    return density.values

def get_biomass_density(biomass:Union[xr.DataArray,NoneType],density_args:Union[xr.DataArray,NoneType],regions:xr.DataArray,method:str) -> np.ndarray:
    '''
    This function calculates the biomass densities for each region based on the method used.

    Parameters:
    biomass: Union[xr.DataArray,NoneType]
        The biomass data to be split. If using Xu method, this parameter is not used.
    density_args: Union[xarray.DataArray,NoneType]
        Arguments for calculating the density (area_map for the regression method).
    regions: xarray.DataArray
        The regions used to define different forest/nonforest biomass densities.
    method: str
        The method to use for biomass density calculation. Options are 'xu' and 'song' or 'CCI'.

    Returns:
    numpy.ndarray
        The biomass densities for each region.

    '''
    
    assert method in ['xu','song','CCI'], 'Invalid method. Options are "xu", "song" or "CCI".'
    
    if method == 'xu':
        # load Table 1 from Xu et al.
        xu_biomass_density_df = pd.read_excel('../data/regions_data/xu_et_al_2021/biomass_densities_table1.xlsx')

        # take the columns that report biomass densities
        densities = xu_biomass_density_df.filter(regex='Mg/ha')

        # calculate the average biomass density for 2000 and 2019
        densities = (densities.filter(regex='2000').values + densities.filter(regex='2019').values) / 2
        
        # change the column names to forest and shrub
        xu_biomass_density_df[['forest','shrubland']] = densities

        # set the biomass density to 0 for cropland
        xu_biomass_density_df['cropland'] = 0

        # get the biomass densities and the code for the regions
        densities_per_area = xu_biomass_density_df[['code','forest','shrubland','cropland']].set_index('code').sort_index().values

    if method == 'song':
        # for song et al., calculate density based on average biomass density and average landcover for 
        # the entire period due to non-smooth behavior of the landcover map in time
        densities_per_area = regress_density(biomass.mean(dim='time'),density_args.mean(dim='time'),regions)
    
    if method == 'CCI':
        # regress the biomass densities for each region for each year in biomass
        densities_per_area = np.stack([regress_density(biomass.sel(time=year),density_args.sel(time=year),regions) for year in biomass['time']])

    return densities_per_area
    

def split_biomass(biomass:xr.DataArray,area_frac:xr.DataArray,density_args:Union[xr.DataArray,NoneType], regions:Union[gpd.GeoDataFrame,xr.DataArray], method:str,test=False) -> xr.DataArray:
    '''
    This function splits the biomass data into forest and non-forest biomass based on the landcover map and the area fraction of each landcover type.

    The stages of the analysis are:
    - Resample the area fraction data to match the biomass data resolution.
    - Calculate the biomass densities based on the method used.
    - Split the biomass data into forest and non-forest biomass based on the landcover map and the area fraction of each landcover type.

    Parameters:
    biomass: xarray.DataArray
        The biomass data to be split.
    area_frac: xarray.DataArray
        The area fraction of each landcover type.
    density_args: Union[xarray.DataArray,NoneType]
        Arguments for calculating the density (area_map for the regression method).
    regions: xarray.DataArray or geopandas.GeoDataFrame
        The regions used to define different forest/nonforest biomass densities.
    method: str
        The method to use for biomass density calculation. Options are 'xu' and 'song' or 'CCI'.
    test: bool
        If True, the function will output the data to allow comparing it with the original data. Default is False.
    
    Returns:
    xarray.DataArray
        The split biomass data.
    '''
    
    assert method in ['xu','song','CCI'], 'Invalid method. Options are "xu", "song" or "CCI".'

    ## 1. Resampling of input data ##

    
    # select the time period of the area fraction data that matches the biomass data
    area_frac = area_frac.sel(time = biomass.time)
    
    # resample the area fraction data to match the biomass data resolution
    area_frac = xr.concat([resample_match(area_frac.sel(landcover=i),biomass) for i in area_frac['landcover']],dim='landcover')

    # remove artifacts from the resampling
    area_frac = area_frac.where(area_frac<1e30)

    if method != 'xu':
        # for both regression based approaches ('song' or 'CCI'), the regions are the wwf ecoregions. We convert the GeoDataFrame to a xr.DataArray
        regions = make_geocube(vector_data=regions,measurements=['id'],like= biomass)['id']

        # for the regression based approaches, we need a forest area map ('song' or 'CCI'), so we need to resample the area map to match the biomass data resolution
        if method == 'CCI':
            # avoid resampling twice if CCI is used both for the area and also for the regression
            density_args = area_frac
        else:
            density_args = xr.concat([resample_match(density_args.sel(landcover=i),biomass) for i in density_args['landcover']],dim='landcover')
    
    # resample regions to match the biomass data resolution
    regions = regions.rio.reproject_match(biomass[0,:,:],nodata=np.nan, resampling=Resampling.nearest)

    # fill in missing region values based on the nearest valid region
    regions = regions.rio.interpolate_na()

    ## 2. Calculate biomass densities ##
    # calculate the biomass densities per region
    densities_per_area = get_biomass_density(biomass,density_args,regions,method=method)
    
    # create a mask for NaN values for the regions
    nan_mask = np.isnan(regions.values)

    if len(densities_per_area.shape) == 2:
        # if densities do not have a time dimension (xu or song method) add the time dimension
        densities_per_area = densities_per_area[np.newaxis,:,:]

    # fill in NaN values in the biomass density by using the weighted mean of regions with valid values based on their dominance terms of area
    # count the occurance of each region in the biomass data
    region_counts = regions.where(biomass.mean(dim='time')>0).to_dataframe('id')['id'].value_counts()

    # calculate the weighted average biomass density based on the occurances of each region and their biomass densities
    average_densities = np.nansum(densities_per_area[:,region_counts.index.astype(int),:].transpose(0,2,1) * (region_counts/region_counts.sum()).values,axis=2)

    # fill in NaN values in the biomass density using the time average on the average_densities
    densities_per_area[np.isnan(densities_per_area.sum(axis=2))] = average_densities.mean(axis=0)

    #convert the biomass densities per area into maps of biomass density using the region map
    density = densities_per_area[:,regions.fillna(0).values.astype(int)]
    
    # reapply the NaN mask
    density[:,nan_mask,:] = np.nan    

    # transpose the density data to match the dimensions of the biomass data
    density = density.transpose(3,1,2,0)

    ## 3. Split biomass data ##

    # calculate the fraction of biomass for each landcover type
    fraction = density*area_frac.transpose('landcover','y','x','time')
    fraction = (fraction/fraction.sum(dim='landcover')).transpose('landcover','time','y','x')

    # multiply by the total biomass to get the final answer 
    result = fraction * biomass

    if test:
        # if landcover has three values, we need to sum cropland and shrubland into nonforest
        if 'cropland' in result['landcover']:
            
            # first, we test that this version of the code produces results that are close with the original data when summed up across landcovers
            nonforest = result.sel(landcover=['cropland','shrubland']).sum(dim='landcover').expand_dims('landcover')
        else:
            return result
    else:
        # then we subtitute the code to produce results that sum up exactly to the original data
        nonforest = (biomass - result.sel(landcover=['forest']).fillna(0))
        
    nonforest['landcover'] = ['nonforest']
    result = xr.concat([result.sel(landcover=['forest']),nonforest],dim='landcover')

    return result

### 2.2. For regional data sources

In [3]:
def calc_region_stats(values:xr.DataArray,region:xr.DataArray,gb_dim:list) -> pd.DataFrame:
    '''
    This function calculates the fraction of each value in the values DataArray for each region in the region DataArray.

    Parameters:
    values: xarray.DataArray
        The values to calculate the regional fraction for.
    region: xarray.DataArray
        The regions to calculate statistics based on.
    gb_dim: list
        The dimensions to group by the final result (if you have also time dimension in the data for example).

    Returns:
    pandas.DataFrame
        The fraction of each value in the values DataArray for each region in the region DataArray.
    '''

    # calculate the total for each value in values by accounting for the area of each grid cell
    merged_ds = values*calc_area(values)

    # add to the merged DataArray the region DataArray
    merged_ds['region'] = region

    # group the DataArray by region and sum the values for each region
    df = merged_ds.drop_vars('spatial_ref').groupby('region').apply(lambda x: x.sum(dim='stacked_y_x')).to_dataframe(name='area')

    # normalize the sum by the total sum for the gb_dim dimensions
    df = df.div(df.groupby(gb_dim).sum())

    return df

def split_biomass_regional(biomass_regions:xr.DataArray, biomass:pd.DataFrame ,metadata:pd.DataFrame, density_regions:xr.DataArray, forest_map:xr.DataArray) -> pd.DataFrame:
    '''
    This function splits the regional biomass data into forest and non-forest biomass based on the biomass densities defined by Xu et al. (2021).

    The analysis has 4 steps:
    1. Load and resample inputs - resample the forest area fraction data to match the region_map data resolution. Load Xu et al. biomass density data.
    2. Calculate biomass densities and forest/nonforest stats per region - for each biomass_region, calculate area fraction of each density_region and of forest/nonforest area fraction.
    3. Calculate average biomass density per biomass_region - calculate the average biomass density per biomass_region by multiplying the area fraction of each density_region by its correspoding biomass density.
    4. Calculate biomass fraction per region - calculate the biomass fraction per biomass_region by multiplying the forest/nonforest area fraction by the biomass density

    Parameters:
    biomass_regions: xarray.DataArray
        The regions used to define the biomass data.
    biomass: pandas.DataFrame
        The regional biomass data.
    metadata: pandas.DataFrame
        The the names of the biomass regions.
    density_regions: xarray.DataArray
        The regions used to define different forest/nonforest biomass densities.
    forest_map: xarray.DataArray
        The forest/nonforest area fraction data.

    Returns:
    pandas.DataFrame
        The split biomass data.
    '''
    
    ## 1. Load and resample inputs
    
    # resample the forest area fraction data to match the region map data resolution
    forest_map = xr.concat([resample_match(forest_map.sel(landcover=i),biomass_regions) for i in forest_map['landcover']],dim='landcover')
    
    # load Xu et al. biomass density data
    regional_biomass_density = get_biomass_density(None,None,biomass_regions,method='xu')
    regional_biomass_density = pd.DataFrame(regional_biomass_density[:,:-1],index = np.arange(19),columns=['forest','nonforest'])
    
    ## 2. Calculate biomass densities and forest/nonforest stats per region

    # for each biomass_region, calculate area fraction of each density_region in the biomass density data in Xu et al. and for forest/nonforest area fraction
    region_biomass_density_area_frac = calc_region_stats(density_regions,biomass_regions,gb_dim='region')
    region_f_non_f_frac = calc_region_stats(forest_map,biomass_regions,gb_dim=['region','time'])

    ## 3. calculate average biomass density per biomass_region

    # merge the density_region area fraction with the biomass density data 
    region_density = region_biomass_density_area_frac.merge(regional_biomass_density,left_on='region',right_index=True)

    # calculate the average biomass density per region by multiplying the area fraction of each density region by the biomass density
    region_density = region_density.groupby('region').apply(lambda x: (x[['forest','nonforest']].mul(x['area'],axis=0)).sum())
    
    # stack the data and rename the columns
    region_density.columns.name = 'landcover'
    region_density =region_density.stack()
    region_density.name='density'

    ## 4. calculate biomass fraction per region

    # merge the forest/nonforest area fraction with the biomass density data
    region_merge = region_f_non_f_frac.merge(region_density,left_index=True,right_index=True)

    # calculate the biomass fraction per region by multiplying the forest/nonforest area fraction by the biomass density and normalizing to the sum of both
    region_biomass_frac = region_merge.prod(axis=1)/region_merge.prod(axis=1).groupby(['region','time']).sum()
    
    # rename the index to the region names
    ind_map = {x.loc['id']:x.loc['name'] for _,x in metadata.iterrows()}
    region_biomass_frac = region_biomass_frac.rename(index=ind_map)

    # multiply biomass estimate by biomass fraction to get biomass per region
    biomass = biomass.stack()
    biomass.name='biomass'
    biomass.index.names = ['region','time']
    split_biomass = pd.DataFrame(region_biomass_frac,columns=['biomass_frac']).merge(biomass,left_index=True,right_index=True).prod(axis=1).swaplevel(1,2).unstack()
    
    return split_biomass

## 3. Load data

### 3.1. Land cover data

In [4]:
# Load ESA CCI landcover data
files = glob('../results/00_preprocessing/ESA_CCI_landcover_processed_*.nc')
CCI_data = xr.open_mfdataset(files)['ESA_CCI_landcover_processed']
CCI_data.rio.write_crs('EPSG:4326',inplace=True)

# convert the time to year
CCI_data['time'] = CCI_data['time'].dt.year

# only take data with values larger than 0
CCI_data = CCI_data.where(CCI_data>0)

# calculate the fraction of each landcover type in each pixel
CCI_data = (CCI_data/calc_pixel_area(CCI_data)).rio.set_nodata(np.nan)

# for the biomass density estimate, bin the areas into forest and non-forest
f_nonf_map = CCI_data.copy()

# set the non-forest area as the sum of cropland and shrubland
f_nonf_map[0,:,:,:] = CCI_data.sel(landcover=['cropland','shrubland']).sum(dim='landcover')

# remove the shrubland dimension
f_nonf_map = f_nonf_map[:2,:,:]

# rename the landcover types
f_nonf_map['landcover'] = ['nonforest','forest']

f_nonf_map = f_nonf_map.sel(landcover=['forest','nonforest'])

In [5]:
# Load Song et al. data
song_data = xr.open_dataarray('../results/00_preprocessing/song_et_al_landcover_processed.nc')

### 3.2. Regions

In [6]:
# load wwf ecoregions
wwf_ecoregions = gpd.read_file('../results/00_preprocessing/agg_wwf_ecoregions.shp')

In [7]:
# load region data from Xu et al.
xu_regions = rio.open_rasterio('../data/regions_data/xu_et_al_2021/global_ecoregions.tif').sel(band=1)

# replace regions 0,15, and 16 with NaN and convert values to start from 0
xu_regions = xr.where(xu_regions.isin([0,15,16]),np.nan,xu_regions-101)
xu_regions.rio.write_crs('EPSG:4326',inplace=True);

# for the regional analysis, we need to encode the one-hot encode the regions in a different dimension
xu_regions_types = np.unique(xu_regions.fillna(0))
xu_regions_onehot = xr.concat([(xu_regions==x).astype(float) for x in xu_regions_types],dim='xu_region').drop_vars('band')
xu_regions_onehot['xu_region'] = xu_regions_types

In [8]:
# load GFED regions
GFED_regions = rio.open_rasterio('../data/regions_data/GFED/GFED5_Beta_monthly_2002.nc',variable='basisregions').sel(band=1)['basisregions']
GFED_regions.rio.write_crs(4326,inplace=True);

# replace 0 values with NaN
GFED_regions = GFED_regions.where(GFED_regions!=0)

# set the nodata value to NaN
GFED_regions.rio.write_nodata(np.nan,inplace=True);

# reproject to the xu_regions resolution because we will use xu_regions in the forest/nonforest split
GFED_regions = GFED_regions.rio.reproject_match(xu_regions).drop_vars('band')

# load the names of each region
GFED_region_names = pd.read_excel('../data/biomass/besnard_et_al_2021/data.xlsx',sheet_name='region_names')

### 3.3. Biomass data sources

#### [Liu et al. (2015)](https://www.nature.com/articles/nclimate2581)

In [9]:
liu_data = rio.open_rasterio('../data/biomass/liu_et_al_2015/Global_annual_mean_ABC_lc2001_1993_2012_20150331.nc',masked=True)['Aboveground Biomass Carbon']

# set the coordinatess to the same as the other datasets
liu_data = xr.DataArray(data=liu_data.values.swapaxes(2,1)[:,:,::-1],
                    coords=[liu_data['time'].values,np.linspace(89.875,-89.875,720),np.linspace(-179.875,179.875,1440)],
                    dims=['time','y','x'])
liu_data = liu_data.rio.write_crs(4326)
liu_data['time'] = liu_data['time'].astype(int)

#### [Xu et al. (2021)](https://www.science.org/doi/10.1126/sciadv.abe9829)

In [10]:
# Load data
xu_data = rio.open_rasterio('../data/biomass/xu_et_al_2021/test10a_cd_ab_pred_corr_2000_2019_v2.tif',masked=True,chunks='auto')

# set the year dimension to by integer years
xu_data['time'] = xu_data['time'].dt.year

#### [Chen et al. (2023)](https://essd.copernicus.org/articles/15/897/2023/)

In [11]:
# Load data
files = [sorted(glob(f'../data/biomass/chen_et_al_2023/DATA/{i}/*.tif')) for i in ['AGBC','BGBC']]
chen_agb = xr.open_mfdataset(files[0],concat_dim='time',combine='nested').squeeze()['band_data']
chen_bgb = xr.open_mfdataset(files[1],concat_dim='time',combine='nested').squeeze()['band_data']

# combine above and below ground biomass
chen_data = chen_agb.copy()
chen_data[:] = chen_agb.values + chen_bgb.values

# set the time dimension to integer years
chen_data['time'] = [int(i.split('.')[2][-4:]) for i in files[0]]

# set the nodata variable
chen_data.rio.set_nodata(np.nan,inplace=True)

# down sample to 0.1 degree resolution
chen_data = down_sample(chen_data,x_factor=12,y_factor=12,stat='mean')
chen_data = chen_data.where(chen_data>0)

# drop year 2021
chen_data = chen_data.sel(time=chen_data['time']!=2021)

#### L-VOD data

In [12]:
LVOD_data = xr.open_dataset('../data/biomass/LVOD/AGC_vod_annual_NOAA_Trend_corrected_lat_lon_merged.nc')['AGC_ASC_DESC']
LVOD_data.rio.write_crs(4326,inplace=True);

LVOD_data_ASC_DESC = LVOD_data[0,:,:,:]
LVOD_data_ASC_DESC_max = LVOD_data[1,:,:,:]
LVOD_data_ASC_DESC_min = LVOD_data[2,:,:,:]

#### [Besnard et al. (2021)](https://onlinelibrary.wiley.com/doi/full/10.1111/gcb.15877)

In [13]:
besnard_data = pd.read_excel('../data/biomass/besnard_et_al_2021/data.xlsx',sheet_name='biomass')
besnard_data = besnard_data.set_index(['Region','Year'])['NABP'].unstack().drop(index='Global')

## 4. Run analysis

### 4.1. For gridded data sources

In [14]:
# define the specific arguments for each biomass source
biomass_sources = [liu_data,xu_data,chen_data, LVOD_data_ASC_DESC,LVOD_data_ASC_DESC_max,LVOD_data_ASC_DESC_min]
biomass_names = ['liu_biomass','xu_biomass','chen_biomass','LVOD','LVODmax','LVODmin']
area_frac_maps = {'xu':CCI_data,'song':f_nonf_map,'CCI':CCI_data}
density_args = {'xu':None,'song':song_data,'CCI':CCI_data}
regions = {'xu':xu_regions,'song':wwf_ecoregions,'CCI':wwf_ecoregions}

In [15]:
overwrite = True
for i, method in itertools.product(range(len(biomass_sources)),['xu','song','CCI']):
    print(biomass_names[i],method)
    if os.path.exists(f'../results/01_split_forest_nonforest/{biomass_names[i]}_{method}.nc'):
        if overwrite == False:
            continue
    res = split_biomass(biomass     = biomass_sources[i],
                        area_frac   = area_frac_maps[method],
                        density_args= density_args[method],
                        regions     = regions[method],
                        method      = method,
                        )
    res.to_netcdf(f'../results/01_split_forest_nonforest/{biomass_names[i]}_{method}.nc')

  0%|          | 0/18 [00:00<?, ?it/s]

liu_biomass xu
liu_biomass song
liu_biomass CCI
xu_biomass xu
xu_biomass song
xu_biomass CCI
chen_biomass xu
chen_biomass song


  return func(*(_execute_task(a, cache) for a in args))
  return func(*(_execute_task(a, cache) for a in args))
  return func(*(_execute_task(a, cache) for a in args))
  return func(*(_execute_task(a, cache) for a in args))
  return func(*(_execute_task(a, cache) for a in args))
  return func(*(_execute_task(a, cache) for a in args))
  return func(*(_execute_task(a, cache) for a in args))
  return func(*(_execute_task(a, cache) for a in args))
  return func(*(_execute_task(a, cache) for a in args))
  return func(*(_execute_task(a, cache) for a in args))
  return func(*(_execute_task(a, cache) for a in args))
  return func(*(_execute_task(a, cache) for a in args))
  return func(*(_execute_task(a, cache) for a in args))
  return func(*(_execute_task(a, cache) for a in args))
  return func(*(_execute_task(a, cache) for a in args))
  return func(*(_execute_task(a, cache) for a in args))
  return func(*(_execute_task(a, cache) for a in args))
  return func(*(_execute_task(a, cache) for a in

  AtA = A.T @ A
  w[:] = Atb - AtA @ x


chen_biomass CCI
LVOD xu
LVOD song
LVOD CCI
LVODmax xu
LVODmax song
LVODmax CCI
LVODmin xu
LVODmin song
LVODmin CCI


### 4.2. For regional data sources

The only data source which has regional resolution is Besnard et al. (2021)

In [16]:
result = split_biomass_regional(biomass_regions = GFED_regions,
                                biomass         = besnard_data*1e15,
                                metadata        = GFED_region_names,
                                density_regions = xu_regions_onehot,
                                forest_map      = f_nonf_map
                                )

result.to_csv(f'../results/01_split_forest_nonforest/besnard_biomass_regional.csv')
