# üîß Scikit-learn (`sklearn`)

&gt; **The most popular machine learning library in Python** üêç‚ú®

| **üì¶ Name** | scikit-learn (imported as `sklearn`) |
| **üéØ Purpose** | Simple & efficient tools for data analysis & ML |
| **üíª Language** | Python |
| **üè¢ Built on** | NumPy, SciPy, Matplotlib |
| **üí∞ Cost** | Free & open-source |

## üöÄ What Can You Do?

| Task | Tools |
|------|-------|
| **üìä Classification** | SVM, Random Forest, K-NN, Logistic Regression |
| **üìà Regression** | Linear, Ridge, Lasso, Decision Trees |
| **üß© Clustering** | K-Means, DBSCAN, Hierarchical |
| **üìâ Dimensionality Reduction** | PCA, t-SNE |
| **üîß Model Selection** | Cross-validation, Grid Search |
| **‚öôÔ∏è Preprocessing** | Scaling, encoding, splitting data |

## üíª Quick Example

```python
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = RandomForestClassifier()
model.fit(X_train, y_train)  # üèãÔ∏è Training!
predictions = model.predict(X_test)  # ‚ú® Predictions!

#### **üéì Why Popular?**
‚úÖ Easy to use ‚Äî consistent API | ‚úÖ Great docs üìö | ‚úÖ Perfect for beginners üå± | ‚úÖ Integrates with NumPy/Pandas
> ‚ö†Ô∏è Best for tabular data (spreadsheets). For deep learning (images/text), use TensorFlow or PyTorch! üß†

In [1]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# warnings
import warnings
warnings.filterwarnings('ignore')


In [2]:
# load the data from seaborn
df = sns.load_dataset('tips')
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [3]:
# split the data into features and target
X = df[['total_bill']]
y = df['tip']

In [5]:
# Train test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# load the model
model = LinearRegression()

# fit the model
model.fit(X_train, y_train)


In [6]:
model.predict([[12]])

array([2.20880004])

### **Train the model in one shot**

In [10]:
# import linear regression from sklearn
import pandas as pd
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
# import warnings
import warnings
warnings.filterwarnings('ignore')
# load the data from seaborn
df = sns.load_dataset('tips')

# split the data into features and target
X = df[['total_bill']]
y = df['tip']
# Train test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# load the model
model = LinearRegression()
# fit the model
model.fit(X_train, y_train)

print(f'Predicted tip for a total bill of $12: {model.predict([[12]])[0]:.2f}')

Predicted tip for a total bill of $12: 2.21


### **Multilinear model in linear regression**

In [17]:
# import linear regression from sklearn
import pandas as pd
import seaborn as sns
from sklearn.linear_model import LinearRegression
# import warnings
import warnings
warnings.filterwarnings('ignore')
# load the data from seaborn
df = sns.load_dataset('tips')
print(df.head())
# split the data into multiple features and target
X = df[['total_bill', 'size']]
y = df['tip']
# load the model
model = LinearRegression()
# fit the model
model.fit(X, y)

print(f'Predicted tip for a total bill of $12: {model.predict([[12,2]])[0]:.2f}')

   total_bill   tip     sex smoker  day    time  size
0       16.99  1.01  Female     No  Sun  Dinner     2
1       10.34  1.66    Male     No  Sun  Dinner     3
2       21.01  3.50    Male     No  Sun  Dinner     3
3       23.68  3.31    Male     No  Sun  Dinner     2
4       24.59  3.61  Female     No  Sun  Dinner     4
Predicted tip for a total bill of $12: 2.17


In [31]:
# import libraries
from sklearn.linear_model import LinearRegression
import pandas as pd
import seaborn as sns

# load the data from seaborn
df = sns.load_dataset('tips')

# encode the smoker column
df['smoker_num'] = df['smoker'].map({'Yes': 1, 'No': 0})

print(df.head())

# split multiple features and one target
X = df[['total_bill', 'size', 'smoker_num']]
y = df['tip']

# load the model
model = LinearRegression()

# train the model
model.fit(X, y)
print('=================================================================')
# predict the model 
print(f"the predicted tips is {model.predict([[22, 4 , 1]])[0]:.2f}")

   total_bill   tip     sex smoker  day    time  size smoker_num
0       16.99  1.01  Female     No  Sun  Dinner     2          0
1       10.34  1.66    Male     No  Sun  Dinner     3          0
2       21.01  3.50    Male     No  Sun  Dinner     3          0
3       23.68  3.31    Male     No  Sun  Dinner     2          0
4       24.59  3.61  Female     No  Sun  Dinner     4          0
the predicted tips is 3.41


### **Classification**
#### Binary Classification


In [None]:
# import libraries
import pandas as pd
import seaborn as sns
# import logistic regression from sklearn
from sklearn.linear_model import LogisticRegression
# load the data from seaborn
df = sns.load_dataset('tips')

# split the data into features and target
X = df[['total_bill', 'size']]
y = df['smoker']

# load the model 
model = LogisticRegression()

# train the model 
model.fit(X, y)
print(f"the predicted smoker status is {model.predict([[22, 4]])[0]}")