# Test et valuation of multiple supervised classification solution
The dataset is [basics statistics](https://leagueoflegends.fandom.com/wiki/List_of_champions/Base_statistics) of [League of Legend](https://www.leagueoflegends.com/fr-fr/) champions.

The 2 targets are :
  - recommanded [lanes](https://leagueoflegends.fandom.com/wiki/Lanes) for the champion in normal game mode (defines on which of the 5 lanes the champion is suited to play), encoded in 5 sub-targets.
  - champion's [class](https://leagueoflegends.fandom.com/wiki/Champion_classes) (define the general role of the champion)

## sommaire
- [install/import](#install--import)
  - [Data description](#Data-description)
- [Data preprocessing](#dataPreprocessing)
  - [Feature selection](#feature-selection)
- [Classification](#classification)
  - [k-nearest neighbors](#k-nearest-neighbors)
    - [Feature Scaling](#feature-scaling)
    - [KNN training and valuation](#knn-training-and-valuation)
  - Naive Bayes
    - Naive Bayes training and valuation
  - Decision Trees
    - Decision Trees training and valuation
- Conclusion
- Classification of custom entity 

## install / import

In [19]:
#%pip install scikit-learn
#%pip install numpy
#%pip install pandas
#%pip install matplotlib
#%pip install seaborn

import numpy as np
import pandas as pd
import seaborn as sns
import sklearn

In [2]:
df = pd.read_csv("./projetStatsChampsLol.csv")
df.head

<bound method NDFrame.head of           Name   HP    HP+   HP5  HP5+   MP   MP+    MP5  MP5+  AD  ...   MR+  \
0       Aatrox  650  114.0  3.00  1.00    0   0.0   0.00  0.00  60  ...  2.05   
1         Ahri  590  104.0  2.50  0.60  418  25.0   8.00  0.80  53  ...  1.30   
2        Akali  570  119.0  9.00  0.90  200   0.0  50.00  0.00  62  ...  2.05   
3       Akshan  630  107.0  3.75  0.65  350  40.0   8.20  0.70  52  ...  1.30   
4      Alistar  685  120.0  8.50  0.85  350  40.0   8.50  0.80  62  ...  2.05   
..         ...  ...    ...   ...   ...  ...   ...    ...   ...  ..  ...   ...   
166       Zeri  630  110.0  3.25  0.70  250  45.0   6.00  0.80  56  ...  1.30   
167      Ziggs  606  106.0  6.50  0.60  480  23.5   8.00  0.80  55  ...  1.30   
168     Zilean  574   96.0  5.50  0.50  452  50.0  11.35  0.80  52  ...  1.30   
169        Zoe  630  106.0  7.50  0.60  425  25.0   8.00  0.65  58  ...  1.30   
170       Zyra  574   93.0  5.50  0.50  418  25.0   7.00  0.80  53  ...  1.30  

## Data description
Get the dataset and quick check of eventual failure.
- Features presentation:
  - champion name, useless for classification but more readable than entity numbers for graphical representation 
  - variables annotated with '+' indicate the scalling value added to the initial one for each champion level, from 1 to 18 over the course of the game.
  - HP : number of health points.
    - HP5 : number of health points recovered per 5 seconds
  - MP : Mana points, or energy point for some champs
    - MP5 : number of mana/energy points recovered per 5 seconds
  - AD : Attack Damage, physical pre-mitigation damage dealt when using basic attack (right click)
  - AS : Attack Speed, maximum attack the champ can deal per second 
    - the "+" scalling value is a percentage add to the implied "Bonus Attack Speed" value. It result that the **effective AS** = **basic AS***(1+**Bonus AS**) 
  - AR : ARmor, parameter of a non-linear function that calculates physical incurred damage (physical post-mitigated damage) as a function of physical pre-mitigation damage
  - MR : Magic Resistance, same as armor but for magic damages.
  - MS : Movement Speed, number of game-distance unit traveled per second
  - ranged : variable affecting access to and use of certain in-game objects.
  - range : radius around the champion in which basic attacks can be made.

  "Ranged" is the only non-numerical var but is directly linked to the "range" one, it's used to avoid confusion between certain champions: for example, Urgot is ranged but with a small 350 range for his category, while Lillia is not ranged, even though she has a range of 325. This variable is really important, as some in-game items are only available for one of the two categories.

Target presentation  
- Lanes : the lanes normally define the player's game objective or gameplay for the current match.
  - "Top" : starts on the upper track, often has two types of gameplay: splitpush or engaging teamfights
  - "Jgl" : plays in the jungle, collecting resources via neutral monsters, his goal is to help the 3 lanes via ganks to take advantage of a numerical advantage. he must also prepare the ground to defeat epic monsters with his team regularly in the match. 
  - "Mid" : starts on the middle path, has three objectives: gank side lanes (top and bot), help the Jgl (jungler) vanquish epic monsters, eliminate problematic targets in teamfights.
  - "Bot" : starts on the lower track, almost exclusively AD carry (marksman) or AP carry (battlemage). the aim is to gain as many resources as possible to be able to DPS effectively in teamfights or push quickly.
  - "Sup" : support, his goal is to help his Bot gain resources considerably in the first few minutes of the game, and then he goes on to help all the lanes. He normally leaves the resources to his allies.
- Class : 
  - define the general role of the champion  
  - "marksman"   : use their long range damage and basic attacks to DPS (deal constant Damage Per Second) enemies 
  - "assassin"   : able to penetrate enemy defenses to eliminate weak targets, thanks to their high mobility and capacity to avoid incoming damage
  - "burst"      : mage whose aim is to eliminate one or two targets with a combo of a few spells, then stand back
  - "diver"      : able to penetrate enemy defenses to eliminate weak targets, thanks to their high mobility 
  - "vanguard"   : lead the charge for their team and specialize at bringing the action, by high resistance, mobility and control capabilities
  - "specialist" : psychological zoners, control enemy pathing by using special positioning or zone spells to dissuade an opponent from approaching, at the risk of exposing themselves to a violent counter-attack
  - "juggernaut" : melee titans who excel at both dealing and taking significant amounts of damage, but get low ange and bad mobility
  - "skirmisher" : also named Duelist, built to win a 1v1 against any enemy and survive if others arrive, thanks to strong defensive or holding capabilities
  - "battlemage" : mage very efficient in short range damage areas
  - "enchanter"  : amplifying their allies' effectiveness, by healing or boost them
  - "catcher"    : control enemy movements, but allies-dependent because of low damage resistance or low damages
  - "warden"     : defensive tanks. Wardens stand steadfast, seeking to hold the line by persistently locking down any on-comers who try to pass them.
  - "artillery"  : mage who excel in long range magic damage
 
[glossary](https://www.progressersurleagueoflegends.fr/guides/guides-connaissances/lexique-league-of-legends/) : Splitpush, teamfights, jungle, tank, gank, DPS, carry, etc.

# Data preprocessing

## Feature selection

**Check for missing values :**
no missing values.

In [10]:
print(df.isna().sum().sum())

0


In [49]:
qualitative_vars = ["Ranged"]
target_vars = ["isTop","isJgl","isMid","isBot","isSup","Class"]
quantitative_vars = [col for col in df.columns if col not in qualitative_vars + target_vars + ["Name"]]
print(qualitative_vars)
print(quantitative_vars)
print(target_vars)
X_qualitative = df[qualitative_vars]
X_quantitative = df[quantitative_vars]
X_total = pd.concat([X_quantitative,X_qualitative], axis=1)
Y = df[target_vars]

['Ranged']
['HP', 'HP+', 'HP5', 'HP5+', 'MP', 'MP+', 'MP5', 'MP5+', 'AD', 'AD+', 'AS', 'AS+', 'AR', 'AR+', 'MR', 'MR+', 'MS', 'Range']
['isTop', 'isJgl', 'isMid', 'isBot', 'isSup', 'Class']


In [16]:
# ALED faut expliquer quoi?

# Classification
## k-nearest neighbors

== expliquer le KNN

### Feature scaling

In [76]:
from sklearn.preprocessing import MinMaxScaler

min_max_scaler = MinMaxScaler()
min_max_scaler.fit(X_quantitative)
scaled_quantitative = min_max_scaler.transform(X_quantitative)
X_KNN = pd.concat([pd.DataFrame(scaled_quantitative),X_qualitative], axis=1)
X_KNN.columns = X_KNN.columns.astype(str)
print(X_KNN)
from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()
encoded_y = []
encoded_y.append(label_encoder.fit_transform(Y.Class))
print(f"===== Encoded classes: {label_encoder.classes_}")
encoded_y = np.array(encoded_y).T

          0         1      2     3         4         5      6     7         8  \
0    0.6000  0.544484  0.300  0.80  0.000000  0.000000  0.000  0.00  0.689655   
1    0.4500  0.423488  0.250  0.48  0.788679  0.287356  0.160  0.80  0.448276   
2    0.4000  0.604982  0.900  0.72  0.377358  0.000000  1.000  0.00  0.758621   
3    0.5500  0.459786  0.375  0.52  0.660377  0.459770  0.164  0.70  0.413793   
4    0.6875  0.617082  0.850  0.68  0.660377  0.459770  0.170  0.80  0.758621   
..      ...       ...    ...   ...       ...       ...    ...   ...       ...   
166  0.5500  0.496085  0.325  0.56  0.471698  0.517241  0.120  0.80  0.551724   
167  0.4900  0.447687  0.650  0.48  0.905660  0.270115  0.160  0.80  0.517241   
168  0.4100  0.326690  0.550  0.40  0.852830  0.574713  0.227  0.80  0.413793   
169  0.5500  0.447687  0.750  0.48  0.801887  0.287356  0.160  0.65  0.620690   
170  0.4100  0.290391  0.550  0.40  0.788679  0.287356  0.140  0.80  0.448276   

            9        10    

For each p in [1,2,3] and k in [1,3,5,7], split intro 4 datasets: X_train, y_train, X_test, y_test, with a 60% train / 40% test

### KNN training and valuation

In [96]:
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import recall_score, precision_score, f1_score, accuracy_score

X_train, X_test, y_train, y_test = train_test_split(X_KNN, encoded_y.ravel(), test_size=0.5)
for p in 1,2,3:
    for k in range(1,9,2) : 
        KNN = KNeighborsClassifier(n_neighbors=k, weights="distance", p=p)
        KNN.fit(X_train,y_train)
        prediction = KNN.predict(X_test)
        print("========= KNN split k="+str(k)+" minkowski="+str(p)+" ============")
        print("accuracy= "+str(accuracy_score(y_test, prediction)))
        print("precision= "+str(precision_score(y_test, prediction,average='macro',zero_division=0)))
        print("recall= "+str(recall_score(y_test, prediction,average='macro',zero_division=0)))
        print("f1-score= "+str(f1_score(y_test, prediction,average='macro',zero_division=0)))
print("end")

X_train, X_test, y_train, y_test = train_test_split(X_KNN, encoded_y.ravel(), test_size=0.5, stratify=encoded_y.ravel())
for p in 1,2,3:
    for k in range(1,9,2) : 
        KNN = KNeighborsClassifier(n_neighbors=k, weights="distance", p=p)
        KNN.fit(X_train,y_train)
        prediction = KNN.predict(X_test)
        print("========= KNN stratified split k="+str(k)+" minkowski="+str(p)+" ============")
        print("accuracy= "+str(accuracy_score(y_test, prediction)))
        print("precision= "+str(precision_score(y_test, prediction,average='macro',zero_division=0)))
        print("recall= "+str(recall_score(y_test, prediction,average='macro',zero_division=0)))
        print("f1-score= "+str(f1_score(y_test, prediction,average='macro',zero_division=0)))
print("end")


accuracy= 0.3023255813953488
precision= 0.2892978816055739
recall= 0.28502053502053504
f1-score= 0.2510284587207664
accuracy= 0.3023255813953488
precision= 0.2673992673992674
recall= 0.289016539016539
f1-score= 0.25124499783890736
accuracy= 0.29069767441860467
precision= 0.2194946079561464
recall= 0.2862970362970363
f1-score= 0.22854090354090356
accuracy= 0.27906976744186046
precision= 0.18293329477540005
recall= 0.242007992007992
f1-score= 0.20118227851021372
accuracy= 0.3488372093023256
precision= 0.3433566433566433
recall= 0.2975080475080475
f1-score= 0.2966455113513937
accuracy= 0.3023255813953488
precision= 0.27319004524886875
recall= 0.2517205017205017
f1-score= 0.23726102956872186
accuracy= 0.3023255813953488
precision= 0.2615869424692954
recall= 0.2535520035520035
f1-score= 0.23113580956390317
accuracy= 0.27906976744186046
precision= 0.21675824175824177
recall= 0.23373848373848372
f1-score= 0.20585488154174458
accuracy= 0.3023255813953488
precision= 0.2602564102564103
recall= 0

no