wri_list_tools

Python tools for analyzing list of input features and outputs from gfw_forest_loss_geotrellis.

Input Lists

Input lists have list_id, location_id, geom columns, they correspond to user. These can be read into GeoPandas dataframe

In [6]: from wri_list_tools import read_list_tsv
   ...:
   ...: read_list_tsv("tests/palm-generic-32.tsv")
Out[6]:
             list_id                                           geometry
location_id
 1                 1  POLYGON ((80.44360 6.01474, 80.43591 6.00847, ...
 2                 1  POLYGON ((99.50002 7.93557, 99.49631 7.92656, ...
 3                 1  MULTIPOLYGON (((98.91782 8.04344, 98.91676 8.0...
 4                 1  MULTIPOLYGON (((98.91782 8.04344, 98.91676 8.0...

Job Outputs

Job outputs from gfw_forest_loss_geotrellis also come in TSV format, written by spark into final subdir of job output directory. They can be read into Pandas DataFrame:

In [10]: from wri_list_tools import read_job_output
In [11]: job_df = read_job_output("tests/fcd_output")

In [12]: job_df
Out[12]:
    list_id  location_id  ...                   commodity_threat_protected_areas                             commodity_threat_fires
0         1           26  ...  {"2002":634.7188,"2003":635.1781,"2004":636.63...  {"2001":7.5,"2002":9.5,"2003":13.5,"2004":20.0...
1         1            2  ...  {"2002":16.4619,"2003":29.6455,"2004":42.2971,...  {"2002":25.0,"2003":53.0,"2004":59.5,"2005":79...
2         1           14  ...  {"2002":4603.0059,"2003":4603.0058,"2004":4603...  {"2001":18.0,"2002":28.0,"2003":25.5,"2004":31...
3         1           11  ...  {"2002":0.0,"2003":0.0,"2004":0.0,"2005":0.0,"...  {"2002":103.5,"2003":163.0,"2004":125.0,"2005"...
4         1           15  ...  {"2002":367.2074,"2003":375.3591,"2004":352.59...  {"2001":37.5,"2002":485.5,"2003":843.0,"2004":...

[33 rows x 46 columns]

This is nice but as you can see some of the columns are histograms of yearly values encoded as JSON. These are hard to make use of. Each JSON column can be expanded into a "flat" dataframe

In [15]: from wri_list_tools import expand_dic_col

In [16]: col = job_df["commodity_threat_protected_areas"].map(json.loads)

In [17]: col
Out[17]:
0     {'2002': 634.7188, '2003': 635.1781, '2004': 6...
1     {'2002': 16.4619, '2003': 29.6455, '2004': 42....
2     {'2002': 4603.0059, '2003': 4603.0058, '2004':...
3     {'2002': 0.0, '2003': 0.0, '2004': 0.0, '2005'...
4     {'2002': 367.2074, '2003': 375.3591, '2004': 3...
Name: commodity_threat_protected_areas, dtype: object

In [18]: expand_dic_col(col, ['year'], 'acres')
Out[18]:
             acres
   year
0  2002   634.7188
1  2002    16.4619
2  2002  4603.0059
3  2002     0.0000
4  2002   367.2074
...            ...
28 2019     0.0000
29 2019    29.2045
30 2019     0.0000
31 2019  1825.2936
32 2019     0.0000

[594 rows x 1 columns]

Expanded Job Outputs

Given 46 columns in ForestChangeDiagnostic output some columns are nested and some are not. This information is encoded in the wrapper class that can apply the right transformation. It acts like a dictionary of per-column dataframes:

In [19]: from wri_list_tools import ForestChangeDiagnostic

In [20]: fcd = ForestChangeDiagnostic("tests/fcd_output")

In [21]: list(fcd.keys())
Out[21]:
['index',
 'list_id',
 'status_code',
 'location_error',
 'tree_cover_loss_total_yearly',
 'tree_cover_loss_primary_forest_yearly',
 'tree_cover_loss_peat_yearly',
 'tree_cover_loss_intact_forest_yearly',
 'tree_cover_loss_protected_areas_yearly',
 'tree_cover_loss_sea_landcover_yearly',
 'tree_cover_loss_idn_landcover_yearly',
 'tree_cover_loss_soy_yearly',
 'tree_cover_loss_idn_legal_yearly',
 'tree_cover_loss_idn_forest_moratorium_yearly',
 'tree_cover_loss_prodes_yearly',
 'tree_cover_loss_prodes_wdpa_yearly',
 'tree_cover_loss_prodes_primary_forest_yearly',
 'tree_cover_loss_brazil_biomes_yearly',
 'tree_cover_extent_total',
 'tree_cover_extent_primary_forest',
 'tree_cover_extent_protected_areas',
 'tree_cover_extent_peat',
 'tree_cover_extent_intact_forest',
 'natural_habitat_primary',
 'natural_habitat_intact_forest',
 'total_area',
 'protected_areas_area',
 'peat_area',
 'brazil_biomes',
 'idn_legal_area',
 'sea_landcover_area',
 'idn_landcover_area',
 'idn_forest_moratorium_area',
 'south_america_presence',
 'legal_amazon_presence',
 'brazil_biomes_presence',
 'cerrado_biome_presence',
 'southeast_asia_presence',
 'indonesia_presence',
 'commodity_value_forest_extent',
 'commodity_value_peat',
 'commodity_value_protected_areas',
 'commodity_threat_deforestation',
 'commodity_threat_peat',
 'commodity_threat_protected_areas',
 'commodity_threat_fires']

In [23]: fcd['tree_cover_loss_total_yearly']
Out[23]:
                  tree_cover_loss_total_yearly
location_id year
26          2001                     2055.8131
2           2001                      777.0302
14          2001                     4138.8128
11          2001                     3399.9531
15          2001                      824.4761
...                                        ...
9           2020                     2241.1601
13          2020                     1068.4284
8           2020                     2345.9443
25          2020                     4122.2197
24          2020                     1034.1295

[660 rows x 1 columns]

In [24]: fcd['tree_cover_loss_idn_landcover_yearly']
Out[24]:
                            tree_cover_loss_idn_landcover_yearly
location_id landocver year
 11         Bare land 2001                              610.3934
 15         Bare land 2001                              102.5131
 7          Bare land 2001                                0.8446
 29         Bare land 2001                                6.1497
 31         Bare land 2001                                3.8428
...                                                          ...
            Mining    2020                                0.4611
-1          Mining    2020                                8.0742
 28         Mining    2020                                0.5383
 20         Mining    2020                                2.9997
 9          Mining    2020                                1.0001

[2360 rows x 1 columns]

Those attributes that don't need expanded are accessible as a simple table:

In [27]: fcd.attributes_df.columns
Out[27]:
Index(['index', 'list_id', 'status_code', 'location_error',
       'tree_cover_extent_total', 'tree_cover_extent_primary_forest',
       'tree_cover_extent_protected_areas', 'tree_cover_extent_peat',
       'tree_cover_extent_intact_forest', 'natural_habitat_primary',
       'natural_habitat_intact_forest', 'total_area', 'protected_areas_area',
       'peat_area', 'brazil_biomes', 'idn_legal_area',
       'idn_forest_moratorium_area', 'south_america_presence',
       'legal_amazon_presence', 'brazil_biomes_presence',
       'cerrado_biome_presence', 'southeast_asia_presence',
       'indonesia_presence'],
      dtype='object')

Export to CSV

Flat views are suited to being imported into Excel for manual inspection:

In [28]: fcd.to_csv_dir("/tmp/fcd-out")
/tmp/fcd-out/tree_cover_loss_total_yearly.csv
/tmp/fcd-out/tree_cover_loss_primary_forest_yearly.csv
/tmp/fcd-out/tree_cover_loss_peat_yearly.csv
/tmp/fcd-out/tree_cover_loss_intact_forest_yearly.csv
/tmp/fcd-out/tree_cover_loss_protected_areas_yearly.csv
/tmp/fcd-out/tree_cover_loss_sea_landcover_yearly.csv
/tmp/fcd-out/tree_cover_loss_idn_landcover_yearly.csv
/tmp/fcd-out/tree_cover_loss_soy_yearly.csv
/tmp/fcd-out/tree_cover_loss_idn_legal_yearly.csv
/tmp/fcd-out/tree_cover_loss_idn_forest_moratorium_yearly.csv
/tmp/fcd-out/tree_cover_loss_prodes_yearly.csv
/tmp/fcd-out/tree_cover_loss_prodes_wdpa_yearly.csv
/tmp/fcd-out/tree_cover_loss_prodes_primary_forest_yearly.csv
/tmp/fcd-out/tree_cover_loss_brazil_biomes_yearly.csv
/tmp/fcd-out/sea_landcover_area.csv
/tmp/fcd-out/idn_landcover_area.csv
/tmp/fcd-out/commodity_value_forest_extent.csv
/tmp/fcd-out/commodity_value_peat.csv
/tmp/fcd-out/commodity_value_protected_areas.csv
/tmp/fcd-out/commodity_threat_deforestation.csv
/tmp/fcd-out/commodity_threat_peat.csv
/tmp/fcd-out/commodity_threat_protected_areas.csv
/tmp/fcd-out/commodity_threat_fires.csv
DONE!

Export to GPKG

For visual inspection the geometries from the input TSV can be combined with the result attributes to be saved to GPKG. Each nested column gets its own layer and the attributes are flattened to be readable in QGIS:

In [1]: from wri_list_tools import ForestChangeDiagnostic, read_list_tsv

In [2]: input_df = read_list_tsv("tests/palm-generic-32.tsv")

In [3]: fcd = ForestChangeDiagnostic("tests/fcd_output")

In [4]: fcd.to_gpkg("/tmp/fcd.gpkg", input_df)

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
docs/img		docs/img
scripts		scripts
tests		tests
wri_list_tools		wri_list_tools
.flake8		.flake8
.gitignore		.gitignore
.psc-ide-port		.psc-ide-port
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements-dev.txt		requirements-dev.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs/img

docs/img

scripts

scripts

tests

tests

wri_list_tools

wri_list_tools

.flake8

.flake8

.gitignore

.gitignore

.psc-ide-port

.psc-ide-port

LICENSE.txt

LICENSE.txt

README.md

README.md

requirements-dev.txt

requirements-dev.txt

setup.py

setup.py

Repository files navigation

wri_list_tools

Input Lists

Job Outputs

Expanded Job Outputs

Export to CSV

Export to GPKG

About

Releases

Packages

Contributors 2

Languages

License

wri/wri-list-tools

Folders and files

Latest commit

History

Repository files navigation

wri_list_tools

Input Lists

Job Outputs

Expanded Job Outputs

Export to CSV

Export to GPKG

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages