<img src='vpython.png' width="230px">

# <font color='#333333'>Homework2 - Weedle's Cave<font>

You are entering the Weedle's Cave. Many Pokemons having really different characteristics are fighting. The purpose is to build a Machine Learning algorithm to predict which Pokemon will win the different fights. Will you be able to predict the outcome of future matches?

The dataset is taken from <a href="https://www.kaggle.com/terminus7/pokemon-challenge">https://www.kaggle.com/terminus7/pokemon-challenge</a>
<br>
<br>
<img src='pokemon.png' width="800px">
<br>
<br>

### <font color='#333333'>0- Prerequires<font>

#### <font color='#555555'>Vertica-ML-Python Library installation.<font>

You can download the library using this <a href="https://github.com/vertica/vertica_ml_python">link</a>.

To install it, use the documentation instructions. It is the version 0.1 and some drastic changes can occur for the version 1.0. Do not hesitate to send me your feedbacks at <a href="mailto:badr.ouali@microfocus.com"> badr.ouali@microfocus.com </a>

#### <font color='#555555'>SQL Magic.<font>
    
During this study, you also need to install SQL magic. The following step will help you to install it but the installation changes from a system to another.

You will need sql alchemy 1.1.11 or higher before installing sqlalchemy-vertica

Run

    conda update sqlalchemy

This updates sqlalchemy if you used Anaconda to install your python environment

To install sqlalchemy-vertica run

    pip install sqlalchemy-vertica[pyodbc,vertica-python]

There are a lot of dependencies like psycopg2 and six and pyodbc

Anancoda will take care of most of these

Look at the error log if you did not use Anaconda and need to install the dependencies manually

You may need to install pyodbc manually by running

    pip install pyodbc

After sqlalchemy-vertica is installed ensure ipython-sql is installed by running

    pip install ipython-sql 
    
You can then use the following command.

In [1]:
%load_ext sql
%sql vertica+pyodbc://VerticaDSN

'Connected: None@None'

#### <font color='#555555'>Connection to the Vertica Database using ODBC.<font>
    
In order to use the Vertica ML Python Library and begin the searches, we need first to connect to the Vertica Database. You can use ODBC/JDBC connection. We propose to use an ODBC connection due to the simplicity of the code. Besides, the cursor is indispensable for all the ML part.

In [2]:
import pyodbc
cur=pyodbc.connect("DSN=VerticaDSN").cursor()

You can then import the different functions to create the RVD from the CSV file.

In [3]:
from vertica_ml_python import RVD
from vertica_ml_python import read_csv # This function will help us to load the csv file in the Database.
from vertica_ml_python import drop_table # This function will help us to drop the unnecessary tables. 

In [4]:
drop_table("pokemon",cur)
drop_table("combats",cur)

The table pokemon was successfully dropped.
The table combats was successfully dropped.


We can then parse the two csv files.

In [5]:
pokemons=read_csv('./pokemon.csv',cur)

The parser guess the following columns and types:
Attack: Integer
Defense: Integer
Generation: Integer
HP: Integer
ID: Integer
Legendary: Boolean
Name: Varchar(50)
Sp_Atk: Integer
Sp_Def: Integer
Speed: Integer
Type_1: Varchar(20)
Type_2: Varchar(20)
Illegal characters in the columns names will be erased.
Is any type wrong?
If one of the types is not correct, it will be considered as Varchar(100).
0 - There is one type that I want to modify.
1 - I wish to continue.
2 - I wish to see the columns and their types again.
1
The table pokemon has been successfully created.


In [6]:
combats=read_csv('./combats.csv',cur)

The parser guess the following columns and types:
First_pokemon: Integer
Second_pokemon: Integer
Winner: Integer
Illegal characters in the columns names will be erased.
Is any type wrong?
If one of the types is not correct, it will be considered as Varchar(100).
0 - There is one type that I want to modify.
1 - I wish to continue.
2 - I wish to see the columns and their types again.
1
The table combats has been successfully created.


When the file is big, read_csv is not a suitable solution. It is better to write direct SQL query. However, we suppose that the data are already inside the DB. If the table already exists in the DB, you can use the following syntax to create the RVD.

In [7]:
pokemons=RVD('pokemon',dsn="VerticaDSN")
combats=RVD('combats',dsn="VerticaDSN")

The first dataset represents all the pokemons and the seconds all the fights.

In [8]:
pokemons.filter("Name='Pikachu'")
print(pokemons)
pokemons.undo_filter()

799 elements were filtered


0,1,2,3,4,5,6,7,8,9,10,11,12
,Attack,Defense,Generation,HP,ID,Legendary,Name,Sp_Atk,Sp_Def,Speed,Type_1,Type_2
0.0,55,40,1,35,31,False,Pikachu,50,50,90,Electric,
,...,...,...,...,...,...,...,...,...,...,...,...


Name: pokemon, Number of rows: 1, Number of columns: 12


For example, Pikachu has 35 life points and he is not Legendary. His type is Electric and his special capacities are as strong as the regular ones.
<br>
<img src='pikachu.png' width="250px">
<br>
The second dataset represents all the matches. The following fight is between Pokemon ID6 and ID15.

In [9]:
combats.filter("First_pokemon=6 and Second_pokemon=15")
print(combats)
combats.undo_filter()

49999 elements were filtered


0,1,2,3
,First_pokemon,Second_pokemon,Winner
0.0,6,15,6
,...,...,...


Name: combats, Number of rows: 1, Number of columns: 3


We can see that the winner is the Pokemon ID6, let's try to understand why.

In [10]:
pokemons.filter("(ID=6 or ID=15)")
print(pokemons)
pokemons.undo_filter()

798 elements were filtered


0,1,2,3,4,5,6,7,8,9,10,11,12
,Attack,Defense,Generation,HP,ID,Legendary,Name,Sp_Atk,Sp_Def,Speed,Type_1,Type_2
0.0,64,58,1,58,6,False,Charmeleon,80,65,80,Fire,
1.0,20,55,1,50,15,False,Metapod,25,25,30,Bug,
,...,...,...,...,...,...,...,...,...,...,...,...


Name: pokemon, Number of rows: 2, Number of columns: 12


The fight is between Charmeleon and Metapod. The first one is much more better than the second one. It is then totally obvious that he will win.

<img src='charmeleon.png' width="200px" style="float:left;">
<img src='metapod.png' width="200px">

A Pokemon has a regular attack/defense (comparable to physical attack/defense) and a special attack/defense (comparable to psychic attack/defense). He has also one or two types, some life points and a speed. To understand more about pokemons, please read the concerned <a href="https://en.wikipedia.org/wiki/Pok%C3%A9mon">wikipedia</a> page.

Let's now enjoy and try to use Vertica-ML-Python + direct SQL to understand a little bit more what we can do using Vertica.

### <font color='#333333'>1- First Data Exploration<font>

During this section, we will focus on one Pokemon and try to understand why he looses/wins his fights. 

#### <font color='#000000'>1.0- Filter the combats dataset and select only fights were Pokemon ID141 is involved.<font>

#### <font color='#000000'>1.1- Print the Pokemon ID141 characteristics.<font>

...

#### <font color='#000000'>1.2- Create a new feature inside the combats dataset which indicates if Gyarados won the fight or not.<font>

#### <font color='#000000'>1.3- Draw the Gyarados results histogram. Are our expectations good?<font>

...

#### <font color='#000000'>1.4- Look at Gyarados losses. Try to understand why he lost.<font>

...

### <font color='#333333'>2- First Data Preparation<font>

During this section, we will build a table that we will use for prediction. We will only consider the main characteristics except the Pokemon types, generation and the legendary statut. 

#### <font color='#000000'>2.0- Join the pokemon table to the combats table on First Pokemon and Second Pokemon. Create the new features which will be the difference between Pokemon1 characteristics and Pokemon2 characteristics (for example diff_HP will be Pokemon1.HP-Pokemon2.HP). Redefine the 'Winner' feature as 1 if Pokemon1 won, 0 otherwise.<font>

In [None]:
%%sql


#### <font color='#000000'>2.1- Create the RVD of this new table and print the 10 first elements. What do you notice?<font>

...

#### <font color='#000000'>2.2- Draw the hexbin of diff_HP vs diff_Speed using as aggregation the mean of Winner. Draw the histogram of diff_HP vs Winner. What do you notice?<font>

...

#### <font color='#000000'>2.3- Which algorithm available on Vertica can you use for this study?<font>

...

### <font color='#333333'>3- First Model<font>
    
During this part we will use a logistic regression to predict the Pokemon fight issue.

#### <font color='#000000'>3.0- Separate the RVD into two views (train/test).<font>

#### <font color='#000000'>3.1- Build a logistic regression model to predict the matches issues.<font>

#### <font color='#000000'>3.2- Evaluate the features importance. What do you notice?<font>

...

#### <font color='#000000'>3.3- Evaluate your model.<font>

...

### <font color='#333333'>4- Finding other important variables<font>
    
During this part we will use all the variables except the Pokemon type which is also really important (it needs real modification before the use).

#### <font color='#000000'>4.0- Create the exact same table as question 2.0 adding the Pokemon1/2 generation and if he is legendary (4 more features).<font>

In [None]:
%%sql


#### <font color='#000000'>4.1- Show that the Generation has no influence on the prediction.<font>

...

#### <font color='#000000'>4.2- Show the influence of being a legendary Pokemon on the prediction.<font>

...

### <font color='#333333'>5- The Pokemon's type importance<font>

#### <font color='#000000'>5.0- Delete from the table the two unrelevant features and add the Pokemon1/2 main type.<font>

In [None]:
%%sql


#### <font color='#000000'>5.1- Using two donut charts, show that the Pokemon1's type and the Pokemon2's type have the same distribution.<font>

#### <font color='#000000'>5.2- Create a new table with the new Type1_2 variable which is the concatenation between the Pokemon1 first type and the Pokemon2 first type. If the two type are similar, it is preferable to create a new value 'similar'.<font>

In [None]:
%%sql


#### <font color='#000000'>5.3- Create a new RVD of the table and draw a fully stacked bar of the new variable (use the limit_distinct_elements variables to have something visible).<font>

#### <font color='#000000'>5.4- Use a label encoding (for space optimization inside the DB) and then a one hot encoder to encode the variable.<font>

#### <font color='#000000'>5.5- Save the RVD as a view and build a Logistic Regression Model using the new features (no need to use a test/train as there is no risk of overfitting using these data and logit; you must use BFGS as optimizer). Do not worry, it can take a lot of time.<font>

#### <font color='#000000'>5.6- Evaluate your model. What can you say about the method?<font>

...

#### <font color='#000000'>5.7- We will now use a smarter method. If you don't know the data you are dealing with, it is really hard to create smart features. Each Pokemon has strength and weakness. The following array shows types strength and weakness.<font>
    
<table border="0" cellspacing="0" cellpadding="0"><colgroup><col span="4" /> </colgroup>
<tbody>
<tr>
<td style="text-align: center;"><strong>Pokemon Type</strong></td>
<td style="text-align: center;"><strong>Strong Against</strong></td>
<td style="text-align: center;"><strong>Weak Against</strong></td>
<td style="text-align: center;"><strong>Other</strong></td>
</tr>
<tr>
<td style="text-align: center;">Bug</td>
<td>Dark, Grass, Psychic</td>
<td>Fire, Flying, Rock</td>
<td style="text-align: center;">N/A</td>
</tr>
<tr>
<td style="text-align: center;">Dark</td>
<td>Ghost, Psychic</td>
<td>Bug, Fairy, Fight</td>
<td>Immune to Psychic</td>
</tr>
<tr>
<td style="text-align: center;">Dragon</td>
<td>Dragon</td>
<td>Dragon, Fairy, Ice</td>
<td>Not Effective Against Fairy</td>
</tr>
<tr>
<td style="text-align: center;">Electric</td>
<td>Flying, Water</td>
<td>Ground</td>
<td>Not Effective Against Ground</td>
</tr>
<tr>
<td style="text-align: center;">Fairy</td>
<td>Dark, Dragon, Fight</td>
<td>Poison, Steel</td>
<td>Immune to Dragon</td>
</tr>
<tr>
<td style="text-align: center;">Fighting</td>
<td>Dark, Ice, Normal, Rock, Steel</td>
<td>Fairy, Flying, Psychic</td>
<td>Not Effective Against Ghost</td>
</tr>
<tr>
<td style="text-align: center;">Fire</td>
<td>Bug, Grass, Ice, Steel</td>
<td>Ground, Rock, Water</td>
<td style="text-align: center;">N/A</td>
</tr>
<tr>
<td style="text-align: center;">Flying</td>
<td>Bug, Fight, Grass</td>
<td>Electric, Ice, Rock</td>
<td>Immune to Ground</td>
</tr>
<tr>
<td style="text-align: center;">Ghost</td>
<td>Ghost, Psychic</td>
<td>Dark, Ghost</td>
<td>Immune to Fight, Normal; Not Effective Against Normal</td>
</tr>
<tr>
<td style="text-align: center;">Grass</td>
<td>Ground, Rock, Water</td>
<td>Bug, Fire, Flying, Ice, Poison</td>
<td style="text-align: center;">N/A</td>
</tr>
<tr>
<td style="text-align: center;">Ground</td>
<td>Electric, Fire, Poison, Rock, Steel</td>
<td>Grass, Ice, Water</td>
<td>Immune to Electric; Not Effective Against Flying</td>
</tr>
<tr>
<td style="text-align: center;">Ice</td>
<td>Dragon, Flying, Grass, Ground</td>
<td>Fight, Fire, Rock, Steel</td>
<td style="text-align: center;">N/A</td>
</tr>
<tr>
<td style="text-align: center;">Normal</td>
<td>None</td>
<td>Fight</td>
<td>Immune to Ghost; Not Effective Against Ghost</td>
</tr>
<tr>
<td style="text-align: center;">Poison</td>
<td>Fairy, Grass</td>
<td>Ground, Psychic</td>
<td>Not Effective Against Steel</td>
</tr>
<tr>
<td style="text-align: center;">Psychic</td>
<td>Fight, Poison</td>
<td>Bug, Dark, Ghost</td>
<td>Not Effective Against Dark</td>
</tr>
<tr>
<td style="text-align: center;">Rock</td>
<td>Bug, Fire, Flying, Ice</td>
<td>Fight, Grass, Ground, Steel, Water</td>
<td style="text-align: center;">N/A</td>
</tr>
<tr>
<td style="text-align: center;">Steel</td>
<td>Fairy, Ice, Rock</td>
<td>Fight, Fire, Ground</td>
<td>Immune to Poison</td>
</tr>
<tr>
<td style="text-align: center;">Water</td>
<td>Fire, Ground, Rock</td>
<td>Electric, Grass</td>
<td style="text-align: center;">N/A</td>
</tr>
</tbody>
</table>
#### <font color='#000000'>Having these information, we can build new interesting features as follow.<font>

In [44]:
%%sql
--Printing the types
select Type_1 from pokemon group by 1;

18 rows affected.


Type_1
Ghost
Poison
Rock
Fighting
Psychic
Flying
Ice
Water
Fairy
Normal


In [55]:
%%sql
drop table if exists pokemon_combats;
create table pokemon_combats as
    select 
        Name1,
        Name as Name2,
        Legendary1::int,
        Legendary::int as Legendary2,
        (case 
             --Bug
             when Pokemon1_Type_1='Bug' and (Type_1='Dark' or Type_1='Grass' or Type_1='Psychic') then 2
             when Pokemon1_Type_1='Bug' and (Type_2='Dark' or Type_2='Grass' or Type_2='Psychic') then 1
             when Pokemon1_Type_1='Bug' and (Type_1='Fire' or Type_1='Flying' or Type_1='Rock') then -2
             when Pokemon1_Type_1='Bug' and (Type_2='Fire' or Type_2='Flying' or Type_2='Rock') then -1  
             --Dark
             when Pokemon1_Type_1='Dark' and Type_1='Psychic' then 3
             when Pokemon1_Type_1='Dark' and (Type_1='Ghost' or Type_2='Psychic') then 2
             when Pokemon1_Type_1='Dark' and (Type_2='Ghost') then 1
             when Pokemon1_Type_1='Dark' and (Type_1='Bug' or Type_1='Fairy' or Type_1='Fighting') then -2
             when Pokemon1_Type_1='Dark' and (Type_2='Bug' or Type_2='Fairy' or Type_2='Fighting') then -1
             --Dragon
             when Pokemon1_Type_1='Dragon' and Type_1='Fairy' then -3
             when Pokemon1_Type_1='Dragon' and (Type_1='Ice' or Type_2='Fairy') then -2
             when Pokemon1_Type_1='Dragon' and Type_2='Fairy' then -1
             --Electric
             when Pokemon1_Type_1='Electric' and (Type_1='Flying' or Type_1='Water') then 2
             when Pokemon1_Type_1='Electric' and (Type_2='Flying' or Type_2='Water') then 1
             when Pokemon1_Type_1='Electric' and Type_1='Ground' then -3
             when Pokemon1_Type_1='Electric' and Type_2='Ground' then -2
             --Fairy
             when Pokemon1_Type_1='Fairy' and Type_1='Dragon' then 3
             when Pokemon1_Type_1='Fairy' and (Type_1='Dark' or Type_2='Dragon' or Type_1='Fighting') then 2
             when Pokemon1_Type_1='Fairy' and (Type_2='Dark' or Type_2='Fighting') then 1
             when Pokemon1_Type_1='Fairy' and (Type_1='Poison' or Type_1='Steel') then -2
             when Pokemon1_Type_1='Fairy' and (Type_2='Poison' or Type_2='Steel') then -1
             --Fighting
             when Pokemon1_Type_1='Fighting' and (Type_1='Dark' or Type_1='Ice' or Type_1='Normal' or Type_1='Rock' or Type_1='Steel') then 2
             when Pokemon1_Type_1='Fighting' and (Type_2='Dark' or Type_2='Ice' or Type_2='Normal' or Type_2='Rock' or Type_2='Steel') then 1
             when Pokemon1_Type_1='Fighting' and (Type_1='Fairy' or Type_1='Flying' or Type_1='Psychic' or Type_2='Ghost') then -2
             when Pokemon1_Type_1='Fighting' and (Type_2='Fairy' or Type_2='Flying' or Type_2='Psychic') then -1
             when Pokemon1_Type_1='Fighting' and Type_1='Ghost' then -3
             --Fire
             when Pokemon1_Type_1='Fire' and (Type_1='Bug' or Type_1='Grass' or Type_1='Ice' or Type_1='Steel') then 2
             when Pokemon1_Type_1='Fire' and (Type_2='Bug' or Type_2='Grass' or Type_2='Ice' or Type_2='Steel') then 1
             when Pokemon1_Type_1='Fire' and (Type_1='Ground' or Type_1='Rock' or Type_1='Water') then -2
             when Pokemon1_Type_1='Fire' and (Type_2='Ground' or Type_2='Rock' or Type_2='Water') then -1
             --Flying
             when Pokemon1_Type_1='Flying' and Type_1='Ground' then 3
             when Pokemon1_Type_1='Flying' and (Type_1='Bug' or Type_1='Fighting' or Type_1='Grass' or Type_2='Ground') then 2
             when Pokemon1_Type_1='Flying' and (Type_2='Bug' or Type_2='Fighting' or Type_2='Grass') then 1
             when Pokemon1_Type_1='Flying' and (Type_1='Electric' or Type_1='Ice' or Type_1='Rock') then -2
             when Pokemon1_Type_1='Flying' and (Type_2='Electric' or Type_2='Ice' or Type_2='Rock') then -1
             --Ghost
             when Pokemon1_Type_1='Ghost' and Type_1='Fighting' then 3
             when Pokemon1_Type_1='Ghost' and (Type_1='Psychic' or Type_2='Fighting') then 2
             when Pokemon1_Type_1='Ghost' and Type_2='Psychic' then 1
             when Pokemon1_Type_1='Ghost' and Type_1='Dark' then -2
             when Pokemon1_Type_1='Ghost' and Type_2='Dark' then -1
             --Grass
             when Pokemon1_Type_1='Grass' and (Type_1='Ground' or Type_1='Rock' or Type_1='Water') then 2
             when Pokemon1_Type_1='Grass' and (Type_2='Ground' or Type_2='Rock' or Type_2='Water') then 1
             when Pokemon1_Type_1='Grass' and (Type_1='Bug' or Type_1='Fire' or Type_1='Flying' or Type_1='Ice' or Type_1='Poison') then -2
             when Pokemon1_Type_1='Grass' and (Type_2='Bug' or Type_2='Fire' or Type_2='Flying' or Type_2='Ice' or Type_2='Poison') then -1
             --Ground
             when Pokemon1_Type_1='Ground' and Type_1='Electric' then 3
             when Pokemon1_Type_1='Ground' and (Type_1='Fire' or Type_1='Poison' or Type_1='Rock' or Type_1='Steel' or Type_2='Electric') then 2
             when Pokemon1_Type_1='Ground' and (Type_2='Fire' or Type_2='Poison' or Type_2='Rock' or Type_2='Steel') then 1
             when Pokemon1_Type_1='Ground' and Type_1='Flying' then -3
             when Pokemon1_Type_1='Ground' and (Type_1='Grass' or Type_1='Ice' or Type_1='Water' or Type_2='Flying') then -2
             when Pokemon1_Type_1='Ground' and (Type_2='Grass' or Type_2='Ice' or Type_2='Water') then -1
             --Ice
             when Pokemon1_Type_1='Ice' and (Type_1='Dragon' or Type_1='Flying' or Type_1='Grass' or Type_1='Ground') then 2
             when Pokemon1_Type_1='Ice' and (Type_2='Dragon' or Type_2='Flying' or Type_2='Grass' or Type_2='Ground') then 1
             when Pokemon1_Type_1='Ice' and (Type_1='Fighting' or Type_1='Fire' or Type_1='Rock' or Type_1='Steel') then -2
             when Pokemon1_Type_1='Ice' and (Type_2='Fighting' or Type_2='Fire' or Type_2='Rock' or Type_1='Steel') then -1            
             --Normal
             when Pokemon1_Type_1='Normal' and Type_1='Ghost' then 3
             when Pokemon1_Type_1='Normal' and Type_2='Ghost' then 2
             when Pokemon1_Type_1='Normal' and Type_1='Fighting' then -2
             when Pokemon1_Type_1='Normal' and Type_2='Fighting' then -1             
             --Poison
             when Pokemon1_Type_1='Poison' and (Type_1='Fairy' or Type_1='Grass') then 2
             when Pokemon1_Type_1='Poison' and (Type_2='Fairy' or Type_2='Grass') then 1
             when Pokemon1_Type_1='Poison' and (Type_1='Ground' or Type_1='Psychic' or Type_2='Steel') then -2
             when Pokemon1_Type_1='Poison' and (Type_2='Ground' or Type_2='Psychic') then -1
             when Pokemon1_Type_1='Poison' and Type_1='Steel' then -3            
             --Psychic
             when Pokemon1_Type_1='Psychic' and (Type_1='Fighting' or Type_1='Poison') then 2
             when Pokemon1_Type_1='Psychic' and (Type_2='Fighting' or Type_2='Poison') then 1
             when Pokemon1_Type_1='Psychic' and (Type_1='Bug' or Type_1='Ghost' or Type_2='Dark') then -2
             when Pokemon1_Type_1='Psychic' and (Type_2='Bug' or Type_2='Ghost') then -1
             when Pokemon1_Type_1='Psychic' and Type_1='Dark' then -3
             --Rock
             when Pokemon1_Type_1='Rock' and (Type_1='Bug' or Type_1='Fire' or Type_1='Flying' or Type_1='Ice') then 2
             when Pokemon1_Type_1='Rock' and (Type_2='Bug' or Type_2='Fire' or Type_2='Flying' or Type_2='Ice') then 1
             when Pokemon1_Type_1='Rock' and (Type_1='Fighting' or Type_1='Grass' or Type_1='Ground' or Type_1='Steel' or Type_1='Water') then -2
             when Pokemon1_Type_1='Rock' and (Type_2='Fighting' or Type_2='Grass' or Type_2='Ground' or Type_2='Steel' or Type_2='Water') then -1
             --Steel
             when Pokemon1_Type_1='Steel' and Type_1='Poison' then 3
             when Pokemon1_Type_1='Steel' and (Type_1='Fairy' or Type_1='Ice' or Type_1='Rock' or Type_2='Poison') then 2
             when Pokemon1_Type_1='Steel' and (Type_2='Fairy' or Type_2='Ice' or Type_2='Rock') then 1
             when Pokemon1_Type_1='Steel' and (Type_1='Fighting' or Type_1='Fire' or Type_1='Ground') then -2
             when Pokemon1_Type_1='Steel' and (Type_2='Fighting' or Type_2='Fire' or Type_2='Ground') then -1            
             --Water
             when Pokemon1_Type_1='Water' and (Type_1='Fire' or Type_1='Ground' or Type_1='Rock') then 2
             when Pokemon1_Type_1='Water' and (Type_2='Fire' or Type_2='Ground' or Type_2='Rock') then 1
             when Pokemon1_Type_1='Water' and (Type_1='Electric' or Type_1='Grass') then -2
             when Pokemon1_Type_1='Water' and (Type_2='Electric' or Type_2='Grass') then -1
             --Else
             else 0
         end) as Type1,
        (case 
             --Bug
             when Pokemon1_Type_2='Bug' and (Type_1='Dark' or Type_1='Grass' or Type_1='Psychic') then 2
             when Pokemon1_Type_2='Bug' and (Type_2='Dark' or Type_2='Grass' or Type_2='Psychic') then 1
             when Pokemon1_Type_2='Bug' and (Type_1='Fire' or Type_1='Flying' or Type_1='Rock') then -2
             when Pokemon1_Type_2='Bug' and (Type_2='Fire' or Type_2='Flying' or Type_2='Rock') then -1  
             --Dark
             when Pokemon1_Type_2='Dark' and Type_1='Psychic' then 3
             when Pokemon1_Type_2='Dark' and (Type_1='Ghost' or Type_2='Psychic') then 2
             when Pokemon1_Type_2='Dark' and (Type_2='Ghost') then 1
             when Pokemon1_Type_2='Dark' and (Type_1='Bug' or Type_1='Fairy' or Type_1='Fighting') then -2
             when Pokemon1_Type_2='Dark' and (Type_2='Bug' or Type_2='Fairy' or Type_2='Fighting') then -1
             --Dragon
             when Pokemon1_Type_2='Dragon' and Type_1='Fairy' then -3
             when Pokemon1_Type_2='Dragon' and (Type_1='Ice' or Type_2='Fairy') then -2
             when Pokemon1_Type_2='Dragon' and Type_2='Fairy' then -1
             --Electric
             when Pokemon1_Type_2='Electric' and (Type_1='Flying' or Type_1='Water') then 2
             when Pokemon1_Type_2='Electric' and (Type_2='Flying' or Type_2='Water') then 1
             when Pokemon1_Type_2='Electric' and Type_1='Ground' then -3
             when Pokemon1_Type_2='Electric' and Type_2='Ground' then -2
             --Fairy
             when Pokemon1_Type_2='Fairy' and Type_1='Dragon' then 3
             when Pokemon1_Type_2='Fairy' and (Type_1='Dark' or Type_2='Dragon' or Type_1='Fighting') then 2
             when Pokemon1_Type_2='Fairy' and (Type_2='Dark' or Type_2='Fighting') then 1
             when Pokemon1_Type_2='Fairy' and (Type_1='Poison' or Type_1='Steel') then -2
             when Pokemon1_Type_2='Fairy' and (Type_2='Poison' or Type_2='Steel') then -1
             --Fighting
             when Pokemon1_Type_2='Fighting' and (Type_1='Dark' or Type_1='Ice' or Type_1='Normal' or Type_1='Rock' or Type_1='Steel') then 2
             when Pokemon1_Type_2='Fighting' and (Type_2='Dark' or Type_2='Ice' or Type_2='Normal' or Type_2='Rock' or Type_2='Steel') then 1
             when Pokemon1_Type_2='Fighting' and (Type_1='Fairy' or Type_1='Flying' or Type_1='Psychic' or Type_2='Ghost') then -2
             when Pokemon1_Type_2='Fighting' and (Type_2='Fairy' or Type_2='Flying' or Type_2='Psychic') then -1
             when Pokemon1_Type_2='Fighting' and Type_1='Ghost' then -3
             --Fire
             when Pokemon1_Type_2='Fire' and (Type_1='Bug' or Type_1='Grass' or Type_1='Ice' or Type_1='Steel') then 2
             when Pokemon1_Type_2='Fire' and (Type_2='Bug' or Type_2='Grass' or Type_2='Ice' or Type_2='Steel') then 1
             when Pokemon1_Type_2='Fire' and (Type_1='Ground' or Type_1='Rock' or Type_1='Water') then -2
             when Pokemon1_Type_2='Fire' and (Type_2='Ground' or Type_2='Rock' or Type_2='Water') then -1
             --Flying
             when Pokemon1_Type_2='Flying' and Type_1='Ground' then 3
             when Pokemon1_Type_2='Flying' and (Type_1='Bug' or Type_1='Fighting' or Type_1='Grass' or Type_2='Ground') then 2
             when Pokemon1_Type_2='Flying' and (Type_2='Bug' or Type_2='Fighting' or Type_2='Grass') then 1
             when Pokemon1_Type_2='Flying' and (Type_1='Electric' or Type_1='Ice' or Type_1='Rock') then -2
             when Pokemon1_Type_2='Flying' and (Type_2='Electric' or Type_2='Ice' or Type_2='Rock') then -1
             --Ghost
             when Pokemon1_Type_2='Ghost' and Type_1='Fighting' then 3
             when Pokemon1_Type_2='Ghost' and (Type_1='Psychic' or Type_2='Fighting') then 2
             when Pokemon1_Type_2='Ghost' and Type_2='Psychic' then 1
             when Pokemon1_Type_2='Ghost' and Type_1='Dark' then -2
             when Pokemon1_Type_2='Ghost' and Type_2='Dark' then -1
             --Grass
             when Pokemon1_Type_2='Grass' and (Type_1='Ground' or Type_1='Rock' or Type_1='Water') then 2
             when Pokemon1_Type_2='Grass' and (Type_2='Ground' or Type_2='Rock' or Type_2='Water') then 1
             when Pokemon1_Type_2='Grass' and (Type_1='Bug' or Type_1='Fire' or Type_1='Flying' or Type_1='Ice' or Type_1='Poison') then -2
             when Pokemon1_Type_2='Grass' and (Type_2='Bug' or Type_2='Fire' or Type_2='Flying' or Type_2='Ice' or Type_2='Poison') then -1
             --Ground
             when Pokemon1_Type_2='Ground' and Type_1='Electric' then 3
             when Pokemon1_Type_2='Ground' and (Type_1='Fire' or Type_1='Poison' or Type_1='Rock' or Type_1='Steel' or Type_2='Electric') then 2
             when Pokemon1_Type_2='Ground' and (Type_2='Fire' or Type_2='Poison' or Type_2='Rock' or Type_2='Steel') then 1
             when Pokemon1_Type_2='Ground' and Type_1='Flying' then -3
             when Pokemon1_Type_2='Ground' and (Type_1='Grass' or Type_1='Ice' or Type_1='Water' or Type_2='Flying') then -2
             when Pokemon1_Type_2='Ground' and (Type_2='Grass' or Type_2='Ice' or Type_2='Water') then -1
             --Ice
             when Pokemon1_Type_2='Ice' and (Type_1='Dragon' or Type_1='Flying' or Type_1='Grass' or Type_1='Ground') then 2
             when Pokemon1_Type_2='Ice' and (Type_2='Dragon' or Type_2='Flying' or Type_2='Grass' or Type_2='Ground') then 1
             when Pokemon1_Type_2='Ice' and (Type_1='Fighting' or Type_1='Fire' or Type_1='Rock' or Type_1='Steel') then -2
             when Pokemon1_Type_2='Ice' and (Type_2='Fighting' or Type_2='Fire' or Type_2='Rock' or Type_1='Steel') then -1            
             --Normal
             when Pokemon1_Type_2='Normal' and Type_1='Ghost' then 3
             when Pokemon1_Type_2='Normal' and Type_2='Ghost' then 2
             when Pokemon1_Type_2='Normal' and Type_1='Fighting' then -2
             when Pokemon1_Type_2='Normal' and Type_2='Fighting' then -1             
             --Poison
             when Pokemon1_Type_2='Poison' and (Type_1='Fairy' or Type_1='Grass') then 2
             when Pokemon1_Type_2='Poison' and (Type_2='Fairy' or Type_2='Grass') then 1
             when Pokemon1_Type_2='Poison' and (Type_1='Ground' or Type_1='Psychic' or Type_2='Steel') then -2
             when Pokemon1_Type_2='Poison' and (Type_2='Ground' or Type_2='Psychic') then -1
             when Pokemon1_Type_2='Poison' and Type_1='Steel' then -3            
             --Psychic
             when Pokemon1_Type_2='Psychic' and (Type_1='Fighting' or Type_1='Poison') then 2
             when Pokemon1_Type_2='Psychic' and (Type_2='Fighting' or Type_2='Poison') then 1
             when Pokemon1_Type_2='Psychic' and (Type_1='Bug' or Type_1='Ghost' or Type_2='Dark') then -2
             when Pokemon1_Type_2='Psychic' and (Type_2='Bug' or Type_2='Ghost') then -1
             when Pokemon1_Type_2='Psychic' and Type_1='Dark' then -3
             --Rock
             when Pokemon1_Type_2='Rock' and (Type_1='Bug' or Type_1='Fire' or Type_1='Flying' or Type_1='Ice') then 2
             when Pokemon1_Type_2='Rock' and (Type_2='Bug' or Type_2='Fire' or Type_2='Flying' or Type_2='Ice') then 1
             when Pokemon1_Type_2='Rock' and (Type_1='Fighting' or Type_1='Grass' or Type_1='Ground' or Type_1='Steel' or Type_1='Water') then -2
             when Pokemon1_Type_2='Rock' and (Type_2='Fighting' or Type_2='Grass' or Type_2='Ground' or Type_2='Steel' or Type_2='Water') then -1
             --Steel
             when Pokemon1_Type_2='Steel' and Type_1='Poison' then 3
             when Pokemon1_Type_2='Steel' and (Type_1='Fairy' or Type_1='Ice' or Type_1='Rock' or Type_2='Poison') then 2
             when Pokemon1_Type_2='Steel' and (Type_2='Fairy' or Type_2='Ice' or Type_2='Rock') then 1
             when Pokemon1_Type_2='Steel' and (Type_1='Fighting' or Type_1='Fire' or Type_1='Ground') then -2
             when Pokemon1_Type_2='Steel' and (Type_2='Fighting' or Type_2='Fire' or Type_2='Ground') then -1            
             --Water
             when Pokemon1_Type_2='Water' and (Type_1='Fire' or Type_1='Ground' or Type_1='Rock') then 2
             when Pokemon1_Type_2='Water' and (Type_2='Fire' or Type_2='Ground' or Type_2='Rock') then 1
             when Pokemon1_Type_2='Water' and (Type_1='Electric' or Type_1='Grass') then -2
             when Pokemon1_Type_2='Water' and (Type_2='Electric' or Type_2='Grass') then -1
             --Else
             else 0
         end) as Type2,
        x.HP-pokemon.HP as diff_HP,
        x.Speed-pokemon.Speed as diff_Speed,
        x.Attack-pokemon.Attack as diff_Attack,
        x.Defense-pokemon.Defense as diff_Defense,
        x.Sp_Atk-pokemon.Sp_Atk as diff_Sp_Atk,
        x.Sp_Def-pokemon.Sp_Def as diff_Sp_Def,
        x.Winner::int
    from
        (select
            Name as Name1,
            Second_Pokemon,
            Winner=First_Pokemon::int as Winner,
            Attack,
            Defense,
            HP,
            Sp_Atk,
            Sp_Def,
            Speed,
            Legendary as Legendary1,
            Type_1 as Pokemon1_Type_1,
            Type_2 as Pokemon1_Type_2
        from combats left join pokemon on combats.First_pokemon=pokemon.ID) x
        left join pokemon on x.Second_Pokemon=pokemon.ID;

Done.
Done.


[]

#### <font color='#000000'>Try to understand the new variables. Why data preparation is so important?<font>

...

#### <font color='#000000'>5.8- Draw an hexbin Plot of Type1 vs Type2 (the two new variables). Why will they not help to improve the model?<font>

...

### <font color='#333333'>6- A final algorithm<font>
    
During this part, we will use the most adapted algorithm and try to optimize it.

#### <font color='#000000'>6.0- Which Vertica ML algorithm is the most adapted to the use-case?<font>

...

#### <font color='#000000'>6.1- Build a new table for the ML algorithm you will use.<font>

In [None]:
%%sql


#### <font color='#000000'>6.2- Split your data into train/test sets.<font>

#### <font color='#000000'>6.3- Create your model using default parameters.<font>

#### <font color='#000000'>6.4- Evaluate your model.<font>

...

#### <font color='#000000'>6.5- Improve your model (parameters tuning). Becareful of overfitting!<font>

...

#### <font color='#000000'>6.6- Add the prediction to the RVD and analyse wrong predictions.<font>

...

### <font color='#333333'>7- Your impressions.<font>

Your last task is to send me your impressions of the library with the subject "[Vertica-ML-Python] My impressions". I want to personally thank you for finishing this first homework. Most valuable companies always have highly skilled persons. Vertica is one of those and it needs to always create and innovate.

### <font color='#1e90ff'>To contact me<font>

<b>@: </b><a href="mailto:badr.ouali@microfocus.com">badr.ouali@microfocus.com</a><br>
<b>In: </b><a href="https://www.linkedin.com/in/badr-ouali/">badr-ouali</a>