# Demo: Using a Google Earth engine


This example is the continuation of the previous example: [Using a Dataset](https://metobs-toolkit.readthedocs.io/en/latest/examples/doc_example.html). This example serves as a demonstration on how to get meta-data from the Google Earth Engine (GEE). 

Before proceeding, make sure you have **set up a Google developers account and a GEE project**. See [Using Google Earth Engine](https://metobs-toolkit.readthedocs.io/en/latest/topics/gee_authentication.html#using-google-earth-engine) for a detailed description of this.

There are two classes that facilitate the interaction with GEE and the metobs_toolkit:

 * `GeeStaticDataset`: This class handles GEE Datasets that do not have a time dimension (static). This class is used to extract GEE dataset values at the location of the station (or buffers around them).
 * `GeeDynamicDataset`: This class handles GEE Datasets that have a time dimension. This class is used to extract timeseries of GEE dataset values at the stations locations.

Both classes can hold metadata (=Coordinates of the stations), and the ´GeeDynamicDataset` class can hold timeseries data.

In this example, we first demonstrate the `GeeStaticDataset` to extract extra metadata for the stations. Then we demonstrate how to extract timeseries at the station locations using `GeeDynamicDataset`.

## Create your Dataset

Create a dataset with the demo data.

In [None]:
import metobs_toolkit

your_dataset = metobs_toolkit.Dataset()
your_dataset.update_file_paths(
    input_data_file=metobs_toolkit.demo_datafile, # path to the data file
    input_metadata_file=metobs_toolkit.demo_metadatafile,
    template_file=metobs_toolkit.demo_template,
)

your_dataset.import_data_from_file()

## Extracting LCZ from GEE

Here is an example of how to extract the Local Climate Zone (LCZ) information of your stations. First, we take a look at what is present in the metadata of the dataset. 

In [None]:
your_dataset.metadf.head()

To extract geospatial information for your stations, the **lat** and **lon** (latitude and longitude)
of your stations must be present in the metadf. If so, then geospatial
information will be extracted from GEE at these locations.

To extract the Local Climate Zones (LCZs) of your stations:

In [None]:
lcz_values = your_dataset.get_lcz()
# The LCZs for all your stations are extracted
print(lcz_values)

The first time, in each session, you are asked to authenticated by Google.
Select your Google account and billing project that you have set up and accept the terms of the condition.

*NOTE: For small data-requests the read-only scopes are sufficient, for large data-requests this is insufficient because the data will be written directly to your Google Drive.*

The metadata of your dataset is also updated

In [None]:
print(your_dataset.metadf['lcz'].head())

To make a geospatial plot you can use the following method:

In [None]:
your_dataset.make_geo_plot(variable="lcz")

## Extracting other Geospatial information

Similar as LCZ extraction you can extract the altitude of the stations (from a digital elevation model):

In [None]:
altitudes = your_dataset.get_altitude() #The altitudes are in meters above sea level.
print(altitudes)

A more detailed description of the landcover/land use in the microenvironment can be extracted in the form of landcover fractions in a circular buffer for each station.

You can select to aggregate the landcover classes to water - pervious and impervious, or set aggregation to false to extract the landcover classes as present in the worldcover_10m dataset.

In [None]:
aggregated_landcover = your_dataset.get_landcover(
                                        buffers=[100, 250], # a list of buffer radii in meters
                                        aggregate=True #if True, aggregate landcover classes to the water, pervious and impervious.
                                        )

print(aggregated_landcover)

### Plotting with GeeStaticDataset 

You can make an interactive plot of a `GeeStaticDataset`, by using the `GeeStaticDataset.make_gee_plot()` method.

*Note: Not all Python environments can visualize the interactive map. If that is the case for you, you can save the map
as a (html) file, and open it with your browser.*

In [None]:
# The (default) Gee Modeldata is stored in all Datasets
your_dataset.gee_datasets

We can see 'lcz' is a known GeeStaticDataset, so we can plot it

In [None]:
lcz_map = your_dataset.gee_datasets['lcz']

In [None]:
#Make the plot
your_dataset.make_gee_static_spatialplot(Model=your_dataset.gee_datasets['lcz'])

*note : In the online documentation, no GEE data is shown. This is because the map is interactive, and requires GEE authentication which is not publicly available.*

## Extracting ERA5 timeseries 

Now we demonstrate how to use the `GeeDynamicDataset`, and use the ERA5-land (GEE) dataset for it.
All available Gee Modeldata's for your Dataset are stored in the `GeeDynamicDataset.gee_datasets`.

In [None]:
your_dataset.gee_datasets

In [None]:
era5_model = your_dataset.gee_datasets['ERA5-land']
era5_model.get_info()

We can see that *temp* is a known `ModelObstype` present in the era_model. Thus we can use it to extract temperature timeseries. 

If the target `ModelObstype` is not present, create the `ModelObstype` and add it with `GeeDynamicDataset.add_modelobstype()`.


The toolkit has built-in functionality to extract ERA5 time series at the station locations. The ERA5 data will be stored in a [Modeldata](https://metobs-toolkit.readthedocs.io/en/latest/reference/modeldata.html) instance. Here is an example on how to get the ERA5 time series by using the ``get_modeldata()`` method.

In [None]:
%config InlineBackend.print_figure_kwargs = {'bbox_inches':None} #else the legend is cutoff in ipython inline plots

In [None]:
#Get the ERA5 data for a single station (to reduce data transfer)
your_station = your_dataset.get_station('vlinder02')

#Extract time series at the location of the station
ERA5_data = your_station.get_modeldata(Model=era5_model, 
                                       obstypes=['temp'], 
                                       startdt=None, #if None, the start of the observations is used 
                                       enddt=None, #if None, the end of the observations is used 
                                       get_all_bands=False #if True, all the GEE bands are extracted.
                                       )

#Get info
print(ERA5_data)
ERA5_data.make_plot(obstype_model='temp', 
                    Dataset=your_station, #add the observations to the same plot 
                    obstype_dataset='temp')


### GEE data transfer

There is a limit to the amount of data that can be transferred directly from GEE. When the data cannot be transferred directly, **it will be written to a file on your Google Drive**. The location of the file will be printed out. When the writing to the file is done, you must download the file and import it to an empty *Modeldata* instance using the ``set_model_from_csv()`` method. 

In [None]:
#Illustration
#Extract time series at the locations all the station
ERA5_data = your_dataset.get_modeldata(Model=era5_model, 
                                       obstypes=['temp'], 
                                       startdt=None, #if None, the start of the observations is used 
                                       enddt=None, #if None, the end of the observations is used 
                                       force_to_drive=True, #We can force it to use Google Drive
                                       )

#Because the data amount is too large (or because of the force_to_drive),
# it will be written to a file on your Google Drive! The returned Modeldata is empty.
print(ERA5_data)


In [None]:
#See the output to find the modeldata in your Google Drive, and download the file.
#Update the empty Modeldata with the data from the file

#ERA5_data.set_model_from_csv(csvpath='/home/..../era5_data.csv') #The path to the downloaded file
#print(ERA5_data)

## Plotting with GeeDynamicDataset

You can make an interactive spatial plot to visualize the stations spatially by using the ``make_gee_plot()``.

In [None]:
str(your_dataset)


In [None]:
import datetime
at_time = datetime.datetime(2022,9,12,16)
print(at_time)

spatial_map = your_dataset.make_gee_dynamic_spatialplot(
                            timeinstance=at_time,
                            Model=era5_model,
                            modelobstype='temp',
                            vmin=None, #if none, toolkit makes a guess based on your stations
                            vmax=None) #if none, toolkit makes a guess based on your stations)
spatial_map

*note : In the online documentation, no GEE data is shown. This is because the map is interactive, and requires GEE authentication which is not publicly available.*