# Demo: Analysis

This example is the continuation of the previous example: [filling gaps](https://metobs-toolkit.readthedocs.io/en/latest/examples/filling_example.html). This example serves as an introduction to the Analysis module. In the MetObs-toolkit there is a ``Analysis`` class, that holds some common methods used in research. 

To start, we import the demo dataset.

In [None]:
import metobs_toolkit
dataset = metobs_toolkit.Dataset() #Create a new dataset object

#Load the data
dataset.import_data_from_file(
                    template_file=metobs_toolkit.demo_template, #The template file
                    input_data_file=metobs_toolkit.demo_datafile, #The data file
                    input_metadata_file=metobs_toolkit.demo_metadatafile, #The metadata file
                    )

Later in this demo we will some landcover information, we extract this for all are stations in the dataset.

In [None]:
#Get LCZ, and landcover fractions will be used later on
_lczseries = dataset.get_LCZ()


## Creating an Analysis

The built-in analysis functionality is centered around the ``Analysis`` class. This class holds only records that are assumed to be correct. Thus there are no QC outliers or gaps present defined, the data hold by an `Analysis` is hold in a singel dataframe.


We can create an `Analysis` instance from a ``Dataset`` (or from a ``Station``). 



In [None]:
analysis = metobs_toolkit.Analysis(Dataholder=dataset)
analysis.get_info()

In [None]:
analysis.df.index.get_level_values('datetime').tz


We can inspect the stored data from the ``Analysis.df`` and ``Analysis.metadf`` attributes.

In [None]:
analysis.df.head(10)

In [None]:
analysis.metadf.head(10)

## Analysis methods

An overview of the available analysis methods can be seen in the documentation of the ``Analysis`` class. The relevant methods depend on your data and your interests. As an example, a demonstration of the filter and diurnal cycle of the demo data.



### Filtering data

It is common to filter your data according to specific meteorological phenomena or periods in time. To do this you can use the ``apply_filter_on_records()`` method. 

*NOTE*: The filtering will remove data

In [None]:
print(f'The initial number of records: {analysis.df.shape[0]}')

#filter to non-windy afternoons
analysis.apply_filter_on_records('(wind_speed <= 2.5) & (hour > 12) & (hour < 20)')

#We can apply multiple consecutive filterings
analysis.apply_filter_on_records('season=="autumn" | season=="winter"') #Be aware of quotation! 


print(f'The number of records after filtering: {analysis.df.shape[0]}')


We can also use the metadata to filter to by using ``apply_filter_on_metadata()`` method.

In [None]:
analysis.apply_filter_on_metadata("LCZ == 'Large lowrise'")

## Diurnal cycle 

To make a diurnal cycle plot of your Analysis use the ``get_diurnal_statistics()`` method:

In [None]:
analysis.plot_diurnal_cycle(colorby='name', #each station is plotted, and colored differently
                                trgobstype='humidity', 
                                return_data = False,
                                )

*Note*: Be aware that we filtered the data to wind-still afternoons in autumn! 

If you want to work with the aggregated values, you can use the ``aggregate_df()`` method. As illustration we undo the filtering to have some more variability in the data. Then we aggregate the data per LCZ. 

In [None]:
import numpy as np

#Start with an unfiltered analysis
analysis=metobs_toolkit.Analysis(Dataholder=dataset) 

aggdf = analysis.aggregate_df(trgobstype='temp',
                      agg=["LCZ", "hour"], #by adding hour, we keep the diuranal variation
                      method=np.mean) #the aggregation function to use.
aggdf

### Diurnal cycle of differences

The diurnal cycle of differences is also implemented. The values are the aggregated diurnal differences wrt a reference station.

As an example the diurnal temperature difference cycle is plotted with station *vlinder02* as the reference. The aggregation is done per LCZ.

In [None]:
analysis.plot_diurnal_cycle_with_reference_station(ref_station='vlinder02',
                                                  trgobstype='temp',
                                                  colorby='LCZ')
