# Machine Learning
Machine learning (ML) is a branch of artificial intelligence (AI) that focuses on the development of algorithms and statistical models that enable computers to perform tasks without explicit instructions. Instead of being programmed with specific rules, machine learning systems learn patterns from data and make predictions or decisions based on that data.

### Key Concepts in Machine Learning:
1. **Data:** The foundation of machine learning is data. The more data a system has, the better it can learn patterns and make accurate predictions.

2. **Algorithms:** Machine learning uses various algorithms to analyze and learn from data. These algorithms are mathematical models that improve their performance as they are exposed to more data.

3. **Training:** This is the process where the machine learning model learns from the data. During training, the model adjusts its parameters based on the input data to minimize errors in its predictions.

4. **Features:** Features are the individual measurable properties or characteristics of the data. For example, in a dataset of houses, features might include the size of the house, number of rooms, and location.

5. **Labels:** In supervised learning, labels are the output that the model is trying to predict. For example, in a house price prediction model, the label would be the price of the house.

6. **Model:** The model is the final product of the training process, which can make predictions or decisions without human intervention.

### Types of Machine Learning:
1. **Supervised Learning:** The model is trained on a labeled dataset, meaning the output (label) is known. The goal is to learn a mapping from inputs to outputs. Examples include classification and regression tasks.

2. **Unsupervised Learning:** The model is trained on an unlabeled dataset, meaning the model tries to learn the underlying structure or patterns in the data. Examples include clustering and association tasks.

3. **Reinforcement Learning:** The model learns by interacting with an environment, receiving rewards or penalties based on its actions. The goal is to learn a strategy that maximizes cumulative rewards.

4. **Semi-supervised Learning:** Combines both labeled and unlabeled data for training. Typically, a small amount of labeled data and a large amount of unlabeled data are used.

5. **Deep Learning:** A subset of machine learning that uses neural networks with many layers (hence "deep") to learn complex patterns in data. It is particularly effective in tasks like image and speech recognition.

### Applications of Machine Learning:
- **Recommendation Systems:** Suggesting products or content to users (e.g., Netflix recommendations).
- **Speech Recognition:** Converting spoken language into text (e.g., virtual assistants like Siri).
- **Image Recognition:** Identifying objects or people in images (e.g., facial recognition).
- **Fraud Detection:** Identifying unusual patterns that may indicate fraudulent activity.
- **Self-driving Cars:** Using ML to navigate and make decisions on the road.



In [3]:
pip install jupyterthemes

Collecting jupyterthemes
  Downloading jupyterthemes-0.20.0-py2.py3-none-any.whl (7.0 MB)
     ---------------------------------------- 7.0/7.0 MB 1.0 MB/s eta 0:00:00
Collecting lesscpy>=0.11.2
  Downloading lesscpy-0.15.1-py2.py3-none-any.whl (46 kB)
     -------------------------------------- 46.7/46.7 kB 334.5 kB/s eta 0:00:00
Collecting ply
  Downloading ply-3.11-py2.py3-none-any.whl (49 kB)
     -------------------------------------- 49.6/49.6 kB 836.7 kB/s eta 0:00:00
Installing collected packages: ply, lesscpy, jupyterthemes
Successfully installed jupyterthemes-0.20.0 lesscpy-0.15.1 ply-3.11
Note: you may need to restart the kernel to use updated packages.


In [4]:
!jt -l

Available Themes: 
   chesterish
   grade3
   gruvboxd
   gruvboxl
   monokai
   oceans16
   onedork
   solarizedd
   solarizedl


In [7]:
!jt -t onedork -T

In [None]:
# to go back to white background
#!jt -r

# Techniques under supervised learning:
1. Classification: (Target is discrete)
2. Regression: (Target is continues)

# scikit/sklearn: for machine learning modeling package

In [2]:
import numpy as np
import pandas as pd

In [3]:
class_df = pd.read_csv("Social_Network_Ads.csv")
regre_df = pd.read_csv("50_Startups.csv")

In [4]:
class_df.head()

Unnamed: 0,User ID,Gender,Age,EstimatedSalary,Purchased
0,15624510,Male,19,19000,0
1,15810944,Male,35,20000,0
2,15668575,Female,26,43000,0
3,15603246,Female,27,57000,0
4,15804002,Male,19,76000,0


In [6]:
class_df["Purchased"].value_counts()

0    257
1    143
Name: Purchased, dtype: int64

#scikit/sklearn: for machine learning modeling package

# Illustration-on building classification model

In [7]:
class_df

Unnamed: 0,User ID,Gender,Age,EstimatedSalary,Purchased
0,15624510,Male,19,19000,0
1,15810944,Male,35,20000,0
2,15668575,Female,26,43000,0
3,15603246,Female,27,57000,0
4,15804002,Male,19,76000,0
...,...,...,...,...,...
395,15691863,Female,46,41000,1
396,15706071,Male,51,23000,1
397,15654296,Female,50,20000,1
398,15755018,Male,36,33000,0


In [9]:
# Seperate features/independent/input from target/output/dependent
X = class_df.copy().drop(columns=["User ID"]) # features
y = X.pop("Purchased") # target

In [12]:
y

0      0
1      0
2      0
3      0
4      0
      ..
395    1
396    1
397    1
398    0
399    1
Name: Purchased, Length: 400, dtype: int64

In [13]:
X

Unnamed: 0,Gender,Age,EstimatedSalary
0,Male,19,19000
1,Male,35,20000
2,Female,26,43000
3,Female,27,57000
4,Male,19,76000
...,...,...,...
395,Female,46,41000
396,Male,51,23000
397,Female,50,20000
398,Male,36,33000


## Variable encoding if necessary

### Ordinal variable (Text): There is sense of ranking and its can be convert to number

### Nominal variable: There is no sense of ranking

In [15]:
dic = {"gender":["male", "male", "female", "male", "female", "female"],
      "color":["blue", "red", "yellow", "green", "blue", "blue"],
      "review": ["good", "good", "bad", "good", "better", "better"],
      "grade":["first class", "first class", "second class", "third class", "second class", 
               "second class" ]}

df = pd.DataFrame(dic)
df

Unnamed: 0,gender,color,review,grade
0,male,blue,good,first class
1,male,red,good,first class
2,female,yellow,bad,second class
3,male,green,good,third class
4,female,blue,better,second class
5,female,blue,better,second class


In [16]:
pd.get_dummies(df, columns=["gender"])

Unnamed: 0,color,review,grade,gender_female,gender_male
0,blue,good,first class,0,1
1,red,good,first class,0,1
2,yellow,bad,second class,1,0
3,green,good,third class,0,1
4,blue,better,second class,1,0
5,blue,better,second class,1,0


In [21]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder() # creating the instance
le.fit_transform(df["grade"])

array([0, 0, 1, 2, 1, 1], dtype=int64)

In [22]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder() # creating the instance
df["grade"]=le.fit_transform(df["grade"])

In [23]:
df

Unnamed: 0,gender,color,review,grade
0,male,blue,good,0
1,male,red,good,0
2,female,yellow,bad,1
3,male,green,good,2
4,female,blue,better,1
5,female,blue,better,1


In [25]:
pd.get_dummies(X, columns=["Gender"])

Unnamed: 0,Age,EstimatedSalary,Gender_Female,Gender_Male
0,19,19000,0,1
1,35,20000,0,1
2,26,43000,1,0
3,27,57000,1,0
4,19,76000,0,1
...,...,...,...,...
395,46,41000,1,0
396,51,23000,0,1
397,50,20000,1,0
398,36,33000,0,1


In [26]:
X_encode=pd.get_dummies(X, columns=["Gender"])
X_encode

Unnamed: 0,Age,EstimatedSalary,Gender_Female,Gender_Male
0,19,19000,0,1
1,35,20000,0,1
2,26,43000,1,0
3,27,57000,1,0
4,19,76000,0,1
...,...,...,...,...
395,46,41000,1,0
396,51,23000,0,1
397,50,20000,1,0
398,36,33000,0,1


In [30]:
from sklearn.model_selection import train_test_split

x_train, x_val, y_train, y_val = train_test_split(X_encode, y, test_size=0.1, random_state=42)

In [31]:
x_train

Unnamed: 0,Age,EstimatedSalary,Gender_Female,Gender_Male
381,48,33000,0,1
55,24,55000,1,0
76,18,52000,0,1
25,47,20000,0,1
82,20,49000,0,1
...,...,...,...,...
71,24,27000,1,0
106,26,35000,1,0
270,43,133000,1,0
348,39,77000,0,1


In [32]:
len(x_val)

40

## Build the model

In [34]:
from sklearn.tree import DecisionTreeClassifier
dt = DecisionTreeClassifier()
model_1 = dt.fit(x_train, y_train) # training the model

In [35]:
# Evaluation
y_predict = model_1.predict(x_val)

In [36]:
y_predict

array([1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0,
       1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0], dtype=int64)

In [37]:
y_val

209    0
280    1
33     0
210    1
93     0
84     0
329    1
94     0
266    0
126    0
9      0
361    1
56     0
72     0
132    0
42     0
278    1
376    0
231    0
385    1
77     0
15     0
391    1
271    1
0      0
396    1
114    0
225    0
262    1
104    0
395    1
193    0
261    1
57     0
232    1
116    0
113    0
342    0
158    0
141    0
Name: Purchased, dtype: int64

In [39]:
from sklearn.metrics import accuracy_score
accuracy_score(y_val, y_predict)

0.825

In [40]:
c=[1, 0,  1, 0]
y=[0, 0, 0, 0]

In [41]:
2/4

0.5

In [42]:
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier()
model_2 = rf.fit(x_train, y_train)

In [44]:
y_predict_1 = model_2.predict(x_val)
accuracy_score(y_val, y_predict_1)

0.9

In [49]:

# Create a logistic regression model
from sklearn.linear_model import LogisticRegression
rf = LogisticRegression()
model_3 = rf.fit(x_train, y_train)

In [50]:
y_predict_2 = model_3.predict(x_val)
accuracy_score(y_val, y_predict_2)

0.675

In [5]:
regre_df.head()

Unnamed: 0,R&D Spend,Administration,Marketing Spend,State,Profit
0,165349.2,136897.8,471784.1,New York,192261.83
1,162597.7,151377.59,443898.53,California,191792.06
2,153441.51,101145.55,407934.54,Florida,191050.39
3,144372.41,118671.85,383199.62,New York,182901.99
4,142107.34,91391.77,366168.42,Florida,166187.94
