# Demo: Dataset introduction
 
This is an introduction to get started with the MetObs toolkit. These examples are making use of the demo data files that come with the toolkit.
Once the MetObs toolkit package is installed, you can import its functionality by:

In [None]:
import sys
repodir = '/home/thoverga/Documents/VLINDER_github/MetObs_toolkit'
sys.path.insert(0, str(repodir))
import metobs_toolkit 

metobs_toolkit.__version__

## The Dataset

A dataset is a collection of `Station`'s and all the data and metadata they hold. Most of the methods are
applied directly to a dataset. 

Start by creating an empty Dataset object:

In [None]:
your_dataset = metobs_toolkit.Dataset()
print(your_dataset) #prints out minimal info of your dataset

In [None]:
your_dataset.get_info()

## Importing data

To import your data into a Dataset, you can provide:

* (optional) data file: This is the CSV file containing the observations
* (optional) metadata file: The CSV file containing metadata for all stations.
* (required) template file: This is a (json) file that is used to interpret your data, and metadata file (if present).

In practice, you need to **start by creating a template file** for your data. More information on creating the template can be found in the documentation (under [Mapping to the toolkit](https://metobs-toolkit.readthedocs.io/en/latest/topics/template_mapping.html)).

TIP: *Use the `build_template_prompt()` of the toolkit for creating a template file.*


In [None]:
# metobs_toolkit.build_template_prompt()

To import data, you must specify the paths to your data, metadata and template.
For this example, we use the demo data, metadata and template that come with the toolkit.

In [None]:
your_dataset.import_data_from_file(
        input_data_file=metobs_toolkit.demo_datafile, # path to the data file
        input_metadata_file=metobs_toolkit.demo_metadatafile,
        template_file=metobs_toolkit.demo_template,
        )

your_dataset

## Inspecting the Template

In practice, you need to start by creating a template file for your data. The role of the template is to translate your data and metadata file, to a standard structure and units used by the toolkit. It is therefore only used when importing data.

The template is read, and stored as an attribute of your `Dataset`. In pracktice you do not have to interact with it, but if somehting unexpected happens when reading in the data it can be usefull to investigate the template (as a file) or by using the `Dataset.template.get_info()` method.


In [None]:
your_dataset.template.get_info()

**TIP**: *the `get_info()` method works on all objects of the Metobs toolkit.*

## Inspecting the Data

To get an overview of the data stored in your Dataset you can use

In [None]:
print(your_dataset)

Or you can use the `.get_info()` method to print out more details.

If you want to inspect the data in your Dataset directly, you can take a look at the .df and .metadf attributes

In [None]:
print(your_dataset.df.head())
# equivalent for the metadata
print(your_dataset.metadf.head())


### Inspecting a Station

If you are interested in one station, you can extract all the info for that one station from the dataset by:


In [None]:
favorite_station = your_dataset.get_station(stationname="vlinder02")

Favorite station now contains all the information of that one station. All methods that are applicable to a Dataset are also applicable to a Station. So to inspect your favorite station, you can:

In [None]:
print(favorite_station.show())

## Making timeseries plots

To make timeseries plots, use the following syntax to plot the *temperature* observations of the full Dataset:

In [None]:
%config InlineBackend.print_figure_kwargs = {'bbox_inches':None} #else the legend is cutoff in ipython inline plots

In [None]:
your_dataset.make_plot(obstype='temp')

See the documentation of the ``make_plot()`` method for more details. Here is an example of commonly used arguments.

In [None]:
#Import the standard datetime library to make timestamps from datetime objects
from datetime import datetime

your_dataset.make_plot(
    # specify the names of the stations in a list, or use None to plot all of them.
    stationnames=['vlinder01', 'vlinder03', 'vlinder05'],
    # what obstype to plot (default is 'temp')
    obstype="humidity",
    # choose how to color the timeseries:
    #'name' : a specific color per station
    #'label': a specific color per quality control label
    colorby="label",
    # choose a start and endtime for the series (datetime).
    # Default is None, which uses all available data
    starttime=None,
    endtime=datetime(2022, 9, 9),
    # Specify a title if you do not want the default title
    title='your custom title',
    # Add legend to plot?, by default true
    legend=True,
    # Plot observations that are labeled as outliers.
    show_outliers=True,
)

as mentioned above, one can apply the same methods to a Station object:

In [None]:
favorite_station.make_plot(colorby='label')

## Resampling the time resolution

Coarsening the time resolution (i.g. frequency) of your data can be done by using the ``coarsen_time_resolution()``.

In [None]:
your_dataset.coarsen_time_resolution(freq='30min') #'30min' means 30 minutes

your_dataset.df.head()