# Imports

In [5]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from keras.models import Sequential
from keras.layers import Dense

pd.set_option('display.max_columns', 200)

# Introduction

#   Datasets

## Synthetic dataset

## Real dataset : Spotify Youtube dataset

**Link of the dataset** : # "https://www.kaggle.com/datasets/salvatorerastelli/spotify-and-youtube?resource=download"  


 This dataset consists of songs from various artists worldwide, and for each song, it includes several statistics related to the music's presence on Spotify, such as the number of streams. Additionally, it includes the number of views for the official music video of each song on YouTube.

### Content

It includes 26 variables for each of the songs collected from spotify. These variables are briefly described next:

- Track: name of the song, as visible on the Spotify platform.
- Artist: name of the artist.
- Url_spotify: the Url of the artist.
- Album: the album in wich the song is contained on Spotify.
- Album_type: indicates if the song is relesead on Spotify as a single or contained in an album.
- Uri: a spotify link used to find the song through the API.
- Danceability: describes how suitable a track is for dancing based on a combination of musical elements. 
- Energy: is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. 
- Key: the key the track is in. 
- Loudness: the overall loudness of a track in decibels (dB).
- Speechiness: detects the presence of spoken words in a track. 
- Acousticness: a confidence measure from 0.0 to 1.0 of whether the track is acoustic.
- Liveness: detects the presence of an audience in the recording. 
- Valence: a measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track.
- Tempo: the overall estimated tempo of a track in beats per minute (BPM). 
- Duration_ms: the duration of the track in milliseconds.
- Stream: number of streams of the song on Spotify.

In [3]:
# store our dataset to variable named df

df = pd.read_csv('Spotify_Youtube.csv',  index_col=0)
df.head()

Unnamed: 0,Artist,Url_spotify,Track,Album,Album_type,Uri,Danceability,Energy,Key,Loudness,Speechiness,Acousticness,Instrumentalness,Liveness,Valence,Tempo,Duration_ms,Url_youtube,Title,Channel,Views,Likes,Comments,Description,Licensed,official_video,Stream
0,Gorillaz,https://open.spotify.com/artist/3AA28KZvwAUcZu...,Feel Good Inc.,Demon Days,album,spotify:track:0d28khcov6AiegSCpG5TuT,0.818,0.705,6.0,-6.679,0.177,0.00836,0.00233,0.613,0.772,138.559,222640.0,https://www.youtube.com/watch?v=HyHNuVaZJ-k,Gorillaz - Feel Good Inc. (Official Video),Gorillaz,693555221.0,6220896.0,169907.0,Official HD Video for Gorillaz' fantastic trac...,True,True,1040235000.0
1,Gorillaz,https://open.spotify.com/artist/3AA28KZvwAUcZu...,Rhinestone Eyes,Plastic Beach,album,spotify:track:1foMv2HQwfQ2vntFf9HFeG,0.676,0.703,8.0,-5.815,0.0302,0.0869,0.000687,0.0463,0.852,92.761,200173.0,https://www.youtube.com/watch?v=yYDmaexVHic,Gorillaz - Rhinestone Eyes [Storyboard Film] (...,Gorillaz,72011645.0,1079128.0,31003.0,The official video for Gorillaz - Rhinestone E...,True,True,310083700.0
2,Gorillaz,https://open.spotify.com/artist/3AA28KZvwAUcZu...,New Gold (feat. Tame Impala and Bootie Brown),New Gold (feat. Tame Impala and Bootie Brown),single,spotify:track:64dLd6rVqDLtkXFYrEUHIU,0.695,0.923,1.0,-3.93,0.0522,0.0425,0.0469,0.116,0.551,108.014,215150.0,https://www.youtube.com/watch?v=qJa-VFwPpYA,Gorillaz - New Gold ft. Tame Impala & Bootie B...,Gorillaz,8435055.0,282142.0,7399.0,Gorillaz - New Gold ft. Tame Impala & Bootie B...,True,True,63063470.0
3,Gorillaz,https://open.spotify.com/artist/3AA28KZvwAUcZu...,On Melancholy Hill,Plastic Beach,album,spotify:track:0q6LuUqGLUiCPP1cbdwFs3,0.689,0.739,2.0,-5.81,0.026,1.5e-05,0.509,0.064,0.578,120.423,233867.0,https://www.youtube.com/watch?v=04mfKJWDSzI,Gorillaz - On Melancholy Hill (Official Video),Gorillaz,211754952.0,1788577.0,55229.0,Follow Gorillaz online:\nhttp://gorillaz.com \...,True,True,434663600.0
4,Gorillaz,https://open.spotify.com/artist/3AA28KZvwAUcZu...,Clint Eastwood,Gorillaz,album,spotify:track:7yMiX7n9SBvadzox8T5jzT,0.663,0.694,10.0,-8.627,0.171,0.0253,0.0,0.0698,0.525,167.953,340920.0,https://www.youtube.com/watch?v=1V_xRb0x9aw,Gorillaz - Clint Eastwood (Official Video),Gorillaz,618480958.0,6197318.0,155930.0,The official music video for Gorillaz - Clint ...,True,True,617259700.0


#   Question 01

# Question 02

# Question 03:
How does the choice of model complexity, such as the **number of features**, **hidden layers**, and **neurons** in a neural network, influence the risk of **underfitting** in machine learning models, and how can model complexity be optimized to balance the trade-off between underfitting and overfitting?

## How will we proceed?

In this section, we will investigate how the choice of model complexity influences the risk of underfitting in machine learning models.
Model complexity in a neural network refers to :
- Number of features
- Hidden layers
- Neurons  
These which are crucial design decisions when building models. Finding the right balance between model complexity and performance is essential for developing models that can generalize well to unseen data.

To address this, we will conduct a series of experiments to understand the relationship between model complexity and the risk of underfitting and overfitting. Our goal is to optimize the model complexity and identify the optimal configuration that strikes a balance between the two.

### Dataset Selection

To address this inquiry and delve into it further, we will utilize an authentic dataset comprising songs from Spotify and YouTube.

### Baseline Model
Using the Keras library, we will create a simple neural network with one hidden layer for a binary classification problem:

In [6]:
num_features = 26

# Defining the model architecture
model = Sequential()
model.add(Dense(32, input_dim= num_features, activation='relu'))  # Hidden layer with 32 neurons
model.add(Dense(1, activation='sigmoid'))  # Output layer for binary classification

# Compile the model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])


### Experiment

To experiment with model complexity, we will vary the number of features, hidden layers, and neurons in the neural network