# Quick start

This notebook presents the **basic functionalities** of the `uinauil` package.

## 1. Import 

How to the `uinauil` package:

In [1]:
#!pip install elg
#!pip install scikit-learn
from evalita4elg.uinauil.src import uinauil as ul
#import uinauil as ul

## 2. Tasks
Get the list of **available tasks**

In [2]:
ul.tasks

{'haspeede': {'id': 7498, 'task': 'classification'},
 'textualentailment': {'id': 8121, 'task': 'pairs'},
 'eventi': {'id': 7376, 'task': 'sequence'},
 'sentipolc': {'id': 7479, 'task': 'classification'},
 'facta': {'id': 8045, 'task': 'sequence'},
 'ironita': {'id': 7372, 'task': 'classification'}}

## 3. Work on a task

You can now **select** one of the available tasks using its **name**, for example `'ironita'`.

Howeve, for accessing a task you need to insert the **success code** of the [ELG (European Language Grid) platform](https://live.european-language-grid.eu/). You need this code only for the first time you use a task.

If you don't have such code, you need to:
1. **login** or register to ELG at [this link](https://live.european-language-grid.eu/auth/realms/ELG/protocol/openid-connect/auth?client_id=elg-oob&redirect_uri=urn:ietf:wg:oauth:2.0:oob&response_type=code&scope=openid)
2. **get the code** visiting one of these urls:
    - for short-term authentication that needs to get refreshed regularly: https://live.european-language-grid.eu/auth/realms/ELG/protocol/openid-connect/auth?client_id=elg-oob&redirect_uri=urn:ietf:wg:oauth:2.0:oob&response_type=code&scope=openid
    - for offline access: https://live.european-language-grid.eu/auth/realms/ELG/protocol/openid-connect/auth?client_id=elg-oob&redirect_uri=urn:ietf:wg:oauth:2.0:oob&response_type=code&scope=offline_access


In [3]:
task = ul.Task('ironita')

2023-02-15 14:02:32,559 INFO reading data for task: ironita
2023-02-15 14:02:32,561 INFO zip file already downloaded: ./data/ironita.zip


The dataset has been stored in your computer into the `./data` folder.

Now you can get a **brief description** of the current task with its `desc` attribute

In [4]:
print(task.desc)

----------------------------------------------------------------------
Id             7372
Name           IronITA
Resource type  Corpus
Entity type    LanguageResource
Description    The IronITA dataset collects 4,849 tweets annotated for
               irony and sarcasm. The dataset has been used in the
               IroniTA task (http://www.di.unito.it/~tutreeb/ironita-
               evalita18), organised as part of EVALITA 2018
               (http://www.evalita.it/2018). <p>The dataset is divided
               into training and test data, constituted of
               respectively 3,977 and 872 tweets. In order to comply
               with GDPR privacy rules and Twitter’s policies, the
               identifiers of tweets and users have been anonymized
               and replaced by unique identifiers.
Licences       ['Creative Commons Attribution Non Commercial Share
               Alike 4.0 International']
Languages      ['Italian']
Status         p
--------------------------

### 3.1 Get training and test sets

The **training set** is stored into the `data.training_set` variable, in JSON format

In [5]:
train = task.data.training_set
train[:3]

[{'id': '811156813181841408',
  'text': 'Zurigo, trovato morto il presunto autore della sparatoria nel centro islamico #20dicembre <URL>',
  'label': 0},
 {'id': '811183087350595584',
  'text': 'Zurigo, trovato morto il presunto autore della sparatoria nel centro islamico - <URL> tramite <URL>',
  'label': 0},
 {'id': '826380632376881152',
  'text': 'Zingari..i soliti "MERDOSI"..#cacciamolivia Roma, i rom aggrediscono un 81enne per rapinarlo. Bloccati dai cittadini <URL>',
  'label': 0}]

You can easily convert it into a **Pandas** [DataFrame](https://pandas.pydata.org/docs/reference/frame.html), if needed, using the `from_dict` method of the Pandas library (details [here](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.from_dict.html)):

In [6]:
import pandas as pd

df_train = pd.DataFrame.from_dict(train)
df_train[:3]

Unnamed: 0,id,text,label
0,811156813181841408,"Zurigo, trovato morto il presunto autore della...",0
1,811183087350595584,"Zurigo, trovato morto il presunto autore della...",0
2,826380632376881152,"Zingari..i soliti ""MERDOSI""..#cacciamolivia Ro...",0


If you need to know which keys belong to the **features list** and what is the key of the **target** you can use the following *metadata* of the `data` variable:
* `data.feature_keys`: list of feature keys;
* `data.target_key`: key of the target.


In [7]:
print(f"data.feature_keys: {task.data.feature_keys}")
print(f"data.target_key: {task.data.target_key}")

data.feature_keys: ['text']
data.target_key: label


If you need to know the list of possible **target values** and their **meaning** you can use the following *metadata* of the `data` variable:
* `data.target_values`: list of possible values of the target;
* `data.target_desc`: list of meaning of each possible value of the target.

In [8]:
print(f"data.target_values: {task.data.target_values}")
print(f"data.target_desc: {task.data.target_desc}")

data.target_values: [0, 1]
data.target_desc: ['not ironic', 'ironic']


The **test set** is stored into the `data.test_set` variable, in JSON format as well

In [9]:
test = task.data.test_set
test[:3]

[{'id': '595524450503815168',
  'text': '-Prendere i libri in copisteria-Fare la spesa-Spararmi in bocca-Farmi la doccia',
  'label': 1},
 {'id': '578468106504433665',
  'text': '...comunque con una crociera Costa se non ti ammazza Schettino prima ti spara il terrorista dopo...',
  'label': 1},
 {'id': '577791521174466560',
  'text': '“<MENTION_1> Ogni ragazza: \\"non sono una ragazza gelosa.\\"*3 minuti dopo*\\"CHI CAZZO È QUELLA PUTTANA?\\"”',
  'label': 1}]

### 3.2 Create and train a model

The `uinauil` package **does not contain models**, but only datasets for training *your* models.

You can create your model with any **external package**, using the training set of the task for training it.

Here we create a fake model with a *random generator*, quite useless in the real world..

In [10]:
import random
random.seed(42)


### 3.3 Evaluate your model

You can use your model to make prediction on the test set.

Here we just use a *random generator* for generating random values of the target set.

In [11]:
target_values = task.data.target_values
predictions = [target_values[random.randint(0,len(target_values)-1)] for x in range(len(task.data.test_set))]
predictions[:5]

[0, 0, 1, 0, 0]

You can now **evaluate** the prediction with `evaluate` method, that contains the standard metrics for the task

In [12]:
scores = task.evaluate(predictions)
print(scores)

{'precision_0': 0.48259860788863107, 'recall_0': 0.4759725400457666, 'f1_0': 0.47926267281105994, 'precision_1': 0.48072562358276644, 'recall_1': 0.48735632183908045, 'f1_1': 0.4840182648401826, 'precision_macro': 0.4816621157356987, 'recall_macro': 0.4816644309424235, 'f1_macro': 0.4816404688256213, 'accuracy': 0.481651376146789}
