# Machine Learning

### What is Machine Learning?

* **Definition:** "Computers learn without being explicitly programmed"
* **Traditional vs. Machine Learning:** Animal classification example
* **Inspiration:** Human Learning Process

### Machine Learning Techniques

* **Regression/Estimation:** Predicting continuous values (house price, CO2 emission)
* **Classification:** Predicting categories (benign/malignant cell, customer churn)
* **Clustering:** Grouping similar cases (patients, customers)
* **Association:** Finding co-occurring items (grocery purchases)
* **Anomaly Detection:** Discovering unusual cases (credit card fraud)
* **Sequence Mining:** Predicting next events (website click-stream)
* **Dimension Reduction:** Reducing data size
* **Recommendation Systems:** Recommending items based on similar preferences

### AI vs. Machine Learning vs. Deep Learning

* **AI:** Mimicking human cognitive functions (vision, language, creativity)
* **Machine Learning:** Statistical part of AI, learning from examples
* **Deep Learning:** Subfield of Machine Learning, deeper automation

## Using Python for Machine Learning: A Guide

**Introduction**

* Python's popularity in machine learning and data science
* Overview of key Python packages for ML

**Core Libraries:**

* **NumPy:** Powerful math library for efficient N-dimensional array manipulation (arrays, dictionaries, functions, datatypes, image processing)
* **SciPy:** Collection of numerical algorithms and tools for diverse applications (signal processing, optimization, statistics)
* **Matplotlib:** Popular 2D and 3D plotting library

**Essentials for Data Scientists:**

* **Pandas:** High-level library offering efficient data structures and functions for importing, manipulating, and analyzing data (numerical tables, timeseries)

**Focus: Scikit-Learn**

* A comprehensive machine learning library for Python
* Extensive collection of classification, regression, and clustering algorithms
* Built for integration with NumPy and SciPy
* Well-documented and user-friendly
* Streamlined workflow: pre-processing, feature selection/extraction, train/test splitting, model definition/fitting/tuning, prediction, evaluation, and export

**Scikit-Learn in Action:**

* Highlights:
    * Pre-processing: standardizing data, handling outliers, feature scaling
    * Train/test split: easy separation of data for training and testing
    * Model building: defining and initializing an algorithm (e.g., support vector classifier)
    * Model training: fitting the model with the training data
    * Prediction: using the trained model on the test data for class prediction
    * Model evaluation: metrics like confusion matrix to assess accuracy
    * Model saving: preserving the trained model for future use



## Supervised vs. Unsupervised Learning: Demystifying Machine Learning Models

**Introduction:**

* The video introduces the crucial concepts of supervised and unsupervised learning in machine learning.

### **Supervised Learning:**

* **Concept:** Explained through the meaning of "supervise" in the context of training a machine learning model.
* **Data:** Requires labeled datasets where data points have predefined classes or values.
* **Example:** Cancer dataset with patient data and known classifications (benign/malignant).
* **Components:** 
    * Attributes (column names like "clump thickness")
    * Features (data values in each column)
    * Observations (individual data points)
* **Data types:** Numeric and categorical.
#### **Techniques:**

![Alt text](image.png)

* **Classification:** Predicting discrete categories (e.g., cancer diagnosis).
    
![Alt text](image-1.png)
    
* **Regression:** Predicting continuous values (e.g., CO2 emission based on car features).

![Alt text](image-2.png)

* **Example:** Predicting CO2 emission of a new car using existing car data.

### **Unsupervised Learning:**

* **Concept:** Letting the model discover patterns and insights without pre-defined labels.
* **Data:** Unlabeled datasets where data points lack predefined classes or values.
* **Process:** The model analyzes the data and seeks hidden structures or relationships.
* **Challenges:** More complex algorithms due to lack of prior knowledge about the data.

#### **Techniques:**

![Alt text](image-3.png)

* **Dimensionality reduction:** Simplifying data by eliminating redundant features.
* **Density estimation:** Analyzing data distribution and identifying potentially interesting areas.
* **Market basket analysis:** Discovering relationships between purchased items (e.g., grocery items bought together).
* **Clustering:** Grouping similar data points based on their characteristics.

![Alt text](image-4.png)

* **Example:** Grouping music based on similar audio features for personalized recommendations.

**Key Differences:**

![Alt text](image-5.png))

* **Data labels:** Supervised learning uses labeled data, while unsupervised learning uses unlabeled data.
* **Techniques:** Supervised learning has specific algorithms for classification and regression, while unsupervised learning focuses on pattern discovery and clustering.
* **Controllability:** Supervised learning offers more control and evaluation methods, while unsupervised learning leads to a less predictable environment.

