Piattaforma
Metacritic

## 1. Descrizione del problema e analisi esplorativa
### Caricamento Librerie
Per prima cosa carichiamo le librerie per effettuare operazioni sui dati

    NumPy per creare e operare su array a N dimensioni
    pandas per caricare e manipolare dati tabulari
    matplotlib per creare grafici

Importiamo le librerie usando i loro alias convenzionali e abilitando l'inserimento dei grafici direttamente nel notebook

```python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import surprise
surprise.__version__
%matplotlib inline
```

### Caricamento dei dati
```python
import os.path
from urllib.request import urlretrieve
file = "nome.csv"
url = "url"
if not os.path.exists(file):
    urlretrieve(url, file)
data = pd.read_csv(file, index_col=0)
data.info()
```

### Descrizione problema
Il dataset in esame, contiene informazioni utili ...

### Esplorazione dei dati
```python
# Esempio di esplorazione dati
data.head()
data.describe()
```

```python
# Visualizzazione distribuzione
data.hist(figsize=(10,8))
plt.show()
```

---

## 2. Feature preprocessing
Esempio di preprocessing:
```python
# Gestione valori mancanti
data = data.fillna(0)
# Encoding variabili categoriche
data = pd.get_dummies(data)
```

---

## 3. Modellazione
Esempio di modellazione:
```python
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

X = data.drop('target', axis=1)
y = data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = RandomForestClassifier()
model.fit(X_train, y_train)
```

---

## 4. Valutazione dei modelli di classificazione
Esempio di valutazione:
```python
from sklearn.metrics import classification_report, accuracy_score

y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))
print('Accuracy:', accuracy_score(y_test, y_pred))
```

---

## 5. Analisi del modello migliore
Esempio di analisi:
```python
# Analisi delle feature più importanti
importances = model.feature_importances_
features = X.columns
for feat, imp in sorted(zip(features, importances), key=lambda x: x[1], reverse=True):
    print(f"{feat}: {imp:.3f}")
```



# da commentare

In [None]:
!conda update -n base -c defaults conda
!conda install -c conda-forge scikit-surprise
!conda install numpy=1.26.4
!pip install -q kaggle
!pip install jovian --upgrade --quiet

## 1. Descrizione del problema e analisi esplorativa
### Caricamento Librerie
Per prima cosa carichiamo le librerie per effettuare operazioni sui dati


NumPy per creare e operare su array a N dimensioni
    pandas per caricare e manipolare dati tabulari
    matplotlib per creare grafici

Importiamo le librerie usando i loro alias convenzionali e abilitando l'inserimento dei grafici direttamente nel notebook


In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import surprise
import jovian
from sklearn.datasets import fetch_openml


A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.3.2 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\manto\AppData\Roaming\Python\Python311\site-packages\ipykernel_launcher.py", line 18, in <module>
    app.launch_new_instance()
  File "C:\Users\manto\AppData\Roaming\Python\Python311\site-packages\traitlets\config\application.py", line 1075, in launch_instance
    app.start()
  File "C:\Users\manto\AppData\Roaming\Python\Python311\site-packages\ipykernel\kernelapp.py", line 739, in start
    self.io_lo

ImportError: numpy.core.multiarray failed to import (auto-generated because you didn't call 'numpy.import_array()' after cimporting numpy; use '<void>numpy._import_array' to disable if you are certain you don't need it).

In [3]:
#NON RIESEGUIRE
if not os.path.exists("kaggle.json") and not os.path.exists("~/.kaggle/kaggle.json"):
    ! mkdir ~/.kaggle
    ! cp kaggle.json ~/.kaggle/
    ! chmod 600 ~/.kaggle/kaggle.json
    ! kaggle datasets list
    !kaggle datasets download antonkozyriev/game-recommendations-on-steam
    !mkdir games
    !unzip game-recommendations-on-steam.zip -d games_dataset

## dataset principale

In [4]:
bigdata=pd.read_csv("games_dataset/games.csv", index_col=0)
bigdata['steam_deck'].value_counts()
bigdata.drop(columns=["steam_deck"], inplace=True);
bigdata

Unnamed: 0_level_0,title,date_release,win,mac,linux,rating,positive_ratio,user_reviews,price_final,price_original,discount
app_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
13500,Prince of Persia: Warrior Within™,2008-11-21,True,False,False,Very Positive,84,2199,9.99,9.99,0.0
22364,BRINK: Agents of Change,2011-08-03,True,False,False,Positive,85,21,2.99,2.99,0.0
113020,Monaco: What's Yours Is Mine,2013-04-24,True,True,True,Very Positive,92,3722,14.99,14.99,0.0
226560,Escape Dead Island,2014-11-18,True,False,False,Mixed,61,873,14.99,14.99,0.0
249050,Dungeon of the ENDLESS™,2014-10-27,True,True,False,Very Positive,88,8784,11.99,11.99,0.0
...,...,...,...,...,...,...,...,...,...,...,...
2296380,I Expect You To Die 3: Cog in the Machine,2023-09-28,True,False,False,Very Positive,96,101,22.00,0.00,0.0
1272080,PAYDAY 3,2023-09-21,True,False,False,Mostly Negative,38,29458,40.00,0.00,0.0
1402110,Eternights,2023-09-11,True,False,False,Very Positive,89,1128,30.00,0.00,0.0
2272250,Forgive Me Father 2,2023-10-19,True,False,False,Very Positive,95,82,17.00,0.00,0.0


## editing dataset dati aggiuntivi

Alcune feature non sono rilevanti per il nostro problema, possiamo quindi rimuovere le colonne dal dataframe per risparmiare ulteriormente spazio.

In [5]:
# editing dataset dati aggiuntivi
df = fetch_openml(data_id=43689)
secdata = pd.DataFrame({
    "SteamURL": df.data["SteamURL"],
    "Metacritic": df.data["Metacritic"],
    "Platform": df.data["Platform"],
    "Tags": df.data["Tags"],
    "Languages": df.data["Languages"]
})
secdata=secdata.dropna(axis='rows', subset=["SteamURL","Tags"])
secdata['SteamURL']=secdata['SteamURL'].str[35:-16]
secdata.drop_duplicates(subset=['SteamURL'], inplace=True)
secdata['SteamURL']=secdata['SteamURL'].astype('int64')
secdata.index=secdata['SteamURL']
secdata.drop(columns='SteamURL', inplace=True)

In [6]:
# editing dataset tag
metadata = pd.read_json("games_dataset/games_metadata.json", lines="True")
metadata.to_csv("data3.csv", encoding='utf-8', index=False)
tagsdata = pd.read_csv("data3.csv", index_col=0)
tagsdata=tagsdata.drop(columns='description')
tagsdata.drop(tagsdata[tagsdata['tags'].str.len()==2].index, inplace=True)
tagsdata.index

Index([  13500,   22364,  113020,  226560,  249050,  250180,  253980,  271850,
        282900,   19810,
       ...
       1426440, 2446110, 2205850, 2380280, 2515240, 2455060, 1138640, 2515460,
       1687000, 2272250],
      dtype='int64', name='app_id', length=49628)

In [7]:
# merge dei dataset
tempdata=pd.merge(how='left', left=bigdata, right=secdata, left_index=True, right_index=True)
data=pd.merge(how='left', left=tempdata, right=tagsdata, left_index=True, right_index=True)
data.drop(columns='Tags', inplace=True)
data

Unnamed: 0_level_0,title,date_release,win,mac,linux,rating,positive_ratio,user_reviews,price_final,price_original,discount,Metacritic,Platform,Languages,tags
app_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
13500,Prince of Persia: Warrior Within™,2008-11-21,True,False,False,Very Positive,84,2199,9.99,9.99,0.0,83.0,"PC, PlayStation 2, Xbox, GameCube, PlayStation 3",English,"['Action', 'Adventure', 'Parkour', 'Third Pers..."
22364,BRINK: Agents of Change,2011-08-03,True,False,False,Positive,85,21,2.99,2.99,0.0,,,,['Action']
113020,Monaco: What's Yours Is Mine,2013-04-24,True,True,True,Very Positive,92,3722,14.99,14.99,0.0,83.0,"PC, Linux, macOS, Xbox 360",English,"['Co-op', 'Stealth', 'Indie', 'Heist', 'Local ..."
226560,Escape Dead Island,2014-11-18,True,False,False,Mixed,61,873,14.99,14.99,0.0,52.0,"PC, PlayStation 3, Xbox 360","English, French, Italian, German, Russian, Polish","['Zombies', 'Adventure', 'Survival', 'Action',..."
249050,Dungeon of the ENDLESS™,2014-10-27,True,True,False,Very Positive,88,8784,11.99,11.99,0.0,77.0,"PC, macOS","English, French, German","['Roguelike', 'Strategy', 'Tower Defense', 'Pi..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2296380,I Expect You To Die 3: Cog in the Machine,2023-09-28,True,False,False,Very Positive,96,101,22.00,0.00,0.0,,,,
1272080,PAYDAY 3,2023-09-21,True,False,False,Mostly Negative,38,29458,40.00,0.00,0.0,,,,
1402110,Eternights,2023-09-11,True,False,False,Very Positive,89,1128,30.00,0.00,0.0,,,,
2272250,Forgive Me Father 2,2023-10-19,True,False,False,Very Positive,95,82,17.00,0.00,0.0,,,,"['Early Access', 'FPS', 'Action', 'Retro', 'Fi..."


In [None]:
rec_data=pd.read_csv("games_dataset/recommendations.csv", index_col=0)
rec_data

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=65e2a800-7967-42b4-96c9-bf5538064c40' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>