# AIDL_B_AS01 - Signal Processing, Pattern Recognition and Machine Learning
## Assignment #1 - Features Extraction for Classification
### Human Activity Recognition Using Smartphones

**Dataset:** https://archive.ics.uci.edu/ml/datasets/human+activity+recognition+using+smartphones

**Goal:** Recognize human activity (6 classes) from sensor readings

---
1. DATA INSPECTION AND ACQUISITION
---

In [None]:
# 1.1 Import required libraries
# - numpy, pandas, matplotlib, scipy, etc.


In [None]:
# 1.2 Load specific sensor data files
# Training data (7352 x 128):
#   - body_acc_x_train.txt
#   - total_acc_x_train.txt
#   - body_gyro_x_train.txt
# Testing data (2947 x 128):
#   - body_acc_x_test.txt
#   - total_acc_x_test.txt
#   - body_gyro_x_test.txt


In [None]:
# 1.3 Load labels
#   - y_train.txt (activity labels for training)
#   - y_test.txt (activity labels for testing)


In [None]:
# 1.4 Data visualization and exploration
# - Display sample signals
# - Check data dimensions
# - Verify data format and ranges


---
2. YULE-WALKER LINEAR PREDICTION
---

In [None]:
# 2.1 Select 10 random samples from body_acc_x_train.txt
# - Randomly choose 10 vectors of 128 samples


In [None]:
# 2.2 Implement Yule-Walker linear prediction
# - Use Yule-Walker equations to calculate AR coefficients
# - Predict current sample based on p previous samples


In [None]:
# 2.3 Find optimal order p
# - Test different orders starting from p=2
# - Calculate prediction error for each order
# - Compute average error over all 10 cases
# - Select p that minimizes average error


In [None]:
# 2.4 Visualization
# - Plot original vs predicted data for best order p
# - Show one example from the 10 random vectors


2.5 Analysis Questions

Q1: What happens when p is too high?

Answer:

Q2: How is linear prediction related to AR modeling? When is the output of a linear predictor an AR process?

Answer:

---
3. KNN ALGORITHM IMPLEMENTATION
---

In [None]:
# 3.1 Implement KNN classifier from scratch
# - Calculate Euclidean distance between test and train samples
# - Find K nearest neighbors
# - Perform majority voting for classification

def euclidean_distance(x1, x2):
    """
    Calculate Euclidean distance between two vectors
    """
    pass

def knn_classifier(X_train, y_train, X_test, k=3):
    """
    Implement KNN classification
    """
    pass


In [None]:
# 3.2 Implement accuracy metric
# - Total accuracy = (# correctly classified) / (# total instances) * 100%

def calculate_accuracy(y_true, y_pred):
    """
    Calculate classification accuracy
    """
    pass


---
4. KNN BENCHMARK (RAW DATA)
---

In [None]:
# 4.1 Perform KNN on raw data
# - Use all 3 sensor files (body_acc_x, total_acc_x, body_gyro_x)
# - Set K=3
# - Use Euclidean distance metric


In [None]:
# 4.2 Report baseline accuracy
# - Calculate and report total classification accuracy
# - This serves as benchmark for comparison


---
5. KNN DATA PRE-PROCESSING (NORMALIZATION)
---

In [None]:
# 5.1 Implement normalization methods
# Options to consider:
#   - Zero mean and unit standard deviation (z-score)
#   - Range normalization to [0, 1] (min-max scaling)
#   - Global mean subtraction and global std division

def normalize_zscore(X_train, X_test):
    """
    Z-score normalization
    """
    pass

def normalize_minmax(X_train, X_test):
    """
    Min-max normalization to [0, 1]
    """
    pass


In [None]:
# 5.2 Apply normalization to training and test data
# - Normalize both train and test sets consistently


In [None]:
# 5.3 Perform KNN with K=3 on normalized data
# - Calculate classification accuracy


5.4 Comparison with Benchmark

Discuss improvement (or lack thereof) compared to raw data:

---
6. SELECTING MOST IMPORTANT TRAINING SET
---

In [None]:
# 6.1 Analyze importance of each sensor type
# - body_acc_x: Body acceleration (x-axis)
# - total_acc_x: Total acceleration (x-axis)
# - body_gyro_x: Body gyroscope (x-axis)


6.2 Selection and Justification

Selected two most important files:
1. 
2. 

Justification:

In [None]:
# 6.3 Create reduced training and test sets
# - Use only the selected two sensor types


---
7. KNN FEATURES DESIGN
---

In [None]:
# 7.1 Feature Configuration A: LPC Coefficients
# - Use Linear Prediction Coding coefficients from step 2
# - Extract p coefficients (from optimal order)
# - Maximum feature vector length: 32

def extract_lpc_features(X, order):
    """
    Extract LPC coefficients as features
    """
    pass


In [None]:
# 7.2 Feature Configuration B: Time Domain Features
# Possible features:
#   - Mean, Standard deviation, Variance
#   - Minimum, Maximum, Range
#   - Median, Quartiles
#   - Zero-crossing rate
#   - Mean crossing rate
#   - Skewness, Kurtosis
#   - Energy, Entropy
#   - Signal magnitude area

def extract_time_features(X):
    """
    Extract time domain features
    """
    pass


In [None]:
# 7.3 Feature Configuration C: Frequency Domain Features
# - Apply FFT to each signal
# - Extract features from frequency spectrum:
#   * Spectral energy
#   * Spectral entropy
#   * Dominant frequency
#   * Spectral centroid
#   * Power spectral density features
#   * Frequency bands energy

def extract_frequency_features(X):
    """
    Extract frequency domain features using FFT
    """
    pass


In [None]:
# 7.4 Combined Feature Configurations
# - Experiment with combinations of A, B, C
# - Design at least 3 different configurations
# - Keep total feature vector length â‰¤ 32


In [None]:
# 7.5 Evaluate each configuration
# - Run KNN (K=3) on each feature set
# - Report classification accuracy
# - Compare and select best configuration


7.6 Theoretical Question

Q: Is frequency domain analysis (FFT) related to spectrograms? How are they similar/different?

*Answer:*

---
8. KNN FINE-TUNING (HYPERPARAMETER OPTIMIZATION)
---

In [None]:
# 8.1 Test different values of K
# - Try K values: 1, 3, 5, 7, 9, 11, 13, 15, etc.
# - Use best feature configuration from step 7


In [None]:
# 8.2 Plot K vs Accuracy
# - Visualize effect of K on classification accuracy


8.3 Analysis and Discussion

Q1: What happens when K increases? Why?

*Answer:*

Q2: What happens when K decreases? Why?

*Answer:*

Q3: How is this related to underfitting/overfitting?

*Answer:*

Q4: Why select odd values of K?

*Answer:*

In [None]:
# 8.4 Report optimal K value
# - Select K that gives best accuracy
# - Report final classification performance


---
9. RESULTS SUMMARY AND CONCLUSIONS
---

In [None]:
# 9.1 Summary table of all experiments
# - Benchmark accuracy (raw data)
# - Normalized data accuracy
# - Accuracy with different feature configurations
# - Accuracy with different K values


9.2 Best Configuration

Optimal Configuration:
- Normalization method: 
- Feature set: 
- K value: 
- Final accuracy: 

9.3 Insights and Conclusions

Key Findings:

Feature Importance Observations:

Lessons Learned: