# Intro to machine Learning

- It is a subfield of Artificial Intelligence that involves the development of algorithms and statistical models that enable computers to learn from data without being explicitly programmed.
- When we say that a computer is learning from data without being explicitly programmed, it means that we are not giving the computer a set of rules or instructions to follow for a specific task. Instead, we are providing the computer with a large amount of data and allowing it to learn patterns and relationships within the data on its own. 
- The goal of Machine Learning is to enable computers to automatically improve their performance on a specific task by learning from data.


## Machine Learning can be divided into three main categories:
Supervised Learning, Unsupervised Learning, and Reinforcement Learning.

- Supervised Learning involves training a model on labeled data, where the input data has a corresponding output or target variable. The goal of Supervised Learning is to learn a mapping function from the input variables to the output variable.
- Unsupervised Learning involves training a model on unlabeled data, where the input data does not have a corresponding output or target variable. The goal of Unsupervised Learning is to find patterns or structure in the data.
- Reinforcement Learning involves training a model to make decisions based on feedback from the environment.   Machine Learning is used in a wide range of applications, including image and speech recognition, natural language processing, recommendation systems, and predictive analytics.

## Popular algorithms

- Supervised Learning - Linear Regression, Logistic Regression, Decision Trees, Random Forests, and Neural Networks. Examples of problems that can be solved using Supervised Learning include predicting housing prices, classifying emails as spam or not spam, and recognizing handwritten digits.
- Unsupervised Learning - Clustering, Principal Component Analysis (PCA), and Association Rule Mining. Examples of problems that can be solved using Unsupervised Learning include grouping customers based on their purchasing behavior, identifying topics in a large collection of documents, and detecting anomalies in network traffic.

### Installing libraries

In [None]:
!pip install -U scikit-learn

### Importing libraries

In [1]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression

### Loading dataset

In [2]:
df = pd.read_csv("data/Adidas US Sales Datasets.csv")

### Feature selection

The process of selecting a subset of relevant features (or variables) from a larger set of features in a dataset. The goal of feature selection is to improve the performance of a machine learning model by reducing the number of features used in the model. This can help to reduce overfitting, improve model interpretability, and reduce the computational cost of training the model.

In [3]:
df.head()

Unnamed: 0,Retailer,Retailer ID,Invoice Date,Region,State,City,Product,Price per Unit,Units Sold,Total Sales,Operating Profit,Operating Margin,Sales Method
0,Foot Locker,1185732,1/1/2020,Northeast,New York,New York,Men's Street Footwear,$50.00,1200,"$600,000","$300,000",50%,In-store
1,Foot Locker,1185732,1/2/2020,Northeast,New York,New York,Men's Athletic Footwear,$50.00,1000,"$500,000","$150,000",30%,In-store
2,Foot Locker,1185732,1/3/2020,Northeast,New York,New York,Women's Street Footwear,$40.00,1000,"$400,000","$140,000",35%,In-store
3,Foot Locker,1185732,1/4/2020,Northeast,New York,New York,Women's Athletic Footwear,$45.00,850,"$382,500","$133,875",35%,In-store
4,Foot Locker,1185732,1/5/2020,Northeast,New York,New York,Men's Apparel,$60.00,900,"$540,000","$162,000",30%,In-store


In [None]:
 # Read in the data df =
pd.read_csv('data.csv')  # Split the data into training and
testing sets X_train = df[['feature1', 'feature2',
'feature3']] y_train = df['target'] X_test = df[['feature1',
'feature2', 'feature3']] y_test = df['target']  # Create the
linear regression model model = LinearRegression()  # Train
the model on the training data model.fit(X_train, y_train)
# Make predictions on the testing data y_pred =
model.predict(X_test)  # Evaluate the model's performance
score = model.score(X_test, y_test) print('Model score:',
score) ```  This code assumes that you have a CSV file
called 'data.csv' with columns for the features and the
target variable. You'll need to replace 'feature1',
'feature2', 'feature3', and 'target' with the actual column
names in your data. Let me know if you have any questions or
if you would like me to explain anything in more detail!
