# Introduction to the MetObs-toolkit

In this introduction, you will learn the principal components and methods in the MetObs-toolkit. Let's start by importing it.

Since this package is under development, it is often relevant to know the precise version of the toolkit.

In [9]:
import metobs_toolkit

#Print out the version of the toolkit
print(metobs_toolkit.__version__)

0.4.0a


## The Dataset class

The ´´Dataset´´ class is for most applications the most important class. It holds all your stations and it's data. Thus a ´´Dataset´´ is in principal a collection of stations.

Since raw data files often include observations from multiple stations, we import our raw data always directly into a ´´Dataset´´. We use the ´´Dataset.import_data_from_file()´´ method, to import the raw data into a Dataset. 

A key component for importing raw data, is a description of what your data represents and how it is formatted. This is done by providing a **template file**, that describes how your raw data is structured. 



### Importing your raw data

As an example we will import a demo file of raw observations. In order to do that we need to :

* Create a template file for our raw data file. The ´build_template_prompt´ function will guide you in this process. It will ask questions, once you answerd them a template file is created. It will also propose some code that you use to import your data
* Create a ´Dataset` instance 
* Add the raw data into the ´Dataset´.

In [10]:
# Specify the path to your raw data file (we use the demo file as example)
path_to_datafile=metobs_toolkit.demo_datafile

# We will also use a metadata file
path_to_metadatafile=metobs_toolkit.demo_metadatafile

In [11]:
%%script true

#Create a template for these data files
metobs_toolkit.build_template_prompt()

In [12]:
#specify the path to the templatefile that was created
path_to_templatefile=metobs_toolkit.demo_template #demo file as example!!

Now that we have the datafiles and the templatefile, we create an empty ´Dataset´, and import the data into it.

In [13]:
dataset = metobs_toolkit.Dataset() #Create a new dataset object

#Load the data
dataset.import_data_from_file(
                    template_file=path_to_templatefile, #The template file
                    input_data_file=path_to_datafile, #The data file
                    input_metadata_file=path_to_metadatafile, #The metadata file
                    )

INFO:/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/metobs_toolkit/dataset.py:Reading the templatefile
INFO:/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/metobs_toolkit/io_collection/dataparser.py:Initializing DataParser with <metobs_toolkit.io_collection.filereaders.CsvFileReader object at 0x7ff463fa40a0> and <metobs_toolkit.template.Template object at 0x7ff421647ca0>.
INFO:/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/metobs_toolkit/io_collection/dataparser.py:Entering parse method of <metobs_toolkit.io_collection.dataparser.DataParser object at 0x7ff411cd5780>.
DEBUG:metobs_toolkit.io_collection.filereaders:Reading /home/thoverga/Documents/VLINDER_github/MetObs_toolkit/metobs_toolkit/datafiles/demo_datafile.csv to Dataframe.
DEBUG:/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/metobs_toolkit/io_collection/dataparser.py:Raw data read successfully.
DEBUG:/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/metobs_toolkit/io_collection/dataparser.py:

As can be seen in the printed logs, there is a lot going on when importing the data. That is because tests are applied on your data to check for gaps, and mismatches between data and metadata. 

We can now inspect the ´dataset´ further.

## The dataset attributes

The attributes are holding the data of the dataset. Here we present some attributes that can be usefull to inspect.



<div class="alert alert-block alert-info">
All classes in the MetObs-toolkit have a ´get_info´ methods that prints out an overview of its content.
</div>

* ´Dataset.obstypes` : A collection of ´Obstypes´ that are known. These observationtypes describe a measurable quantity, and its corresponding units.

In [14]:
dataset.obstypes

{'temp': Obstype instance of temp,
 'humidity': Obstype instance of humidity,
 'radiation_temp': Obstype instance of radiation_temp,
 'pressure': Obstype instance of pressure,
 'pressure_at_sea_level': Obstype instance of pressure_at_sea_level,
 'precip': Obstype instance of precip,
 'precip_sum': Obstype instance of precip_sum,
 'wind_speed': Obstype instance of wind_speed,
 'wind_gust': Obstype instance of wind_gust,
 'wind_direction': Obstype instance of wind_direction}

* ´Dataset.template´: A template class, that is automatically set up by using the template file. This is only used when data is imported from a file. It has no further use.

In [15]:
template = dataset.template

template.get_info() # Prints out how the template maps raw data

------ Data obstypes map ---------
 * temp            <---> Temperatuur    
     (raw data in degC)
     (description: 2mT passive)

 * humidity        <---> Vochtigheid    
     (raw data in percent)
     (description: 2m relative humidity passive)

 * wind_speed      <---> Windsnelheid   
     (raw data in km/h)
     (description: Average 2m  10-min windspeed)

 * wind_direction  <---> Windrichting   
     (raw data in degrees)
     (description: Average 2m  10-min windspeed,  ...)


------ Data extra mapping info ---------
 * name column (data) <---> Vlinder

------ Data timestamp map ---------
 * datetimecolumn  <---> None           
 * time_column     <---> Tijd (UTC)     
 * date_column     <---> Datum          
 * fmt             <---> %Y-%m-%d %H:%M:%S
 * Timezone        <---> UTC

------ Metadata map ---------
 * name            <---> Vlinder        
 * lat             <---> lat            
 * lon             <---> lon            
 * school          <---> school         


* ´dataset.df´: A pandas DataFrame holding all the observation records.

In [16]:
dataset.df

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,value,label
datetime,obstype,name,Unnamed: 3_level_1,Unnamed: 4_level_1
2022-09-01 00:00:00+00:00,humidity,vlinder01,65.000000,ok
2022-09-01 00:00:00+00:00,humidity,vlinder02,62.000000,ok
2022-09-01 00:00:00+00:00,humidity,vlinder03,65.000000,ok
2022-09-01 00:00:00+00:00,humidity,vlinder04,66.000000,ok
2022-09-01 00:00:00+00:00,humidity,vlinder05,61.000000,ok
...,...,...,...,...
2022-09-15 23:55:00+00:00,wind_speed,vlinder24,0.000000,ok
2022-09-15 23:55:00+00:00,wind_speed,vlinder25,1.972222,ok
2022-09-15 23:55:00+00:00,wind_speed,vlinder26,0.027778,ok
2022-09-15 23:55:00+00:00,wind_speed,vlinder27,0.000000,ok


* ´dataset.metadf´: A pandas DataFrame holding all the metadata of the stations.

In [17]:
dataset.metadf

Unnamed: 0_level_0,lat,lon,school,geometry
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
vlinder01,50.980438,3.815763,UGent,POINT (3.81576 50.98044)
vlinder02,51.022379,3.709695,UGent,POINT (3.7097 51.02238)
vlinder03,51.324583,4.952109,Heilig Graf,POINT (4.95211 51.32458)
vlinder04,51.335522,4.934732,Heilig Graf,POINT (4.93473 51.33552)
vlinder05,51.052655,3.675183,Sint-Barbara,POINT (3.67518 51.05266)
vlinder06,51.0271,4.5163,BimSem,POINT (4.5163 51.0271)
vlinder07,51.030889,4.478445,PTS,POINT (4.47844 51.03089)
vlinder08,51.02813,4.477398,TSM,POINT (4.4774 51.02813)
vlinder09,50.927167,4.075722,SMI,POINT (4.07572 50.92717)
vlinder10,50.935556,4.041389,SMI,POINT (4.04139 50.93556)
