# Amazon Reviews Sentiment Analysis

Start here when you are ready to build the binary (good/bad) classifier. Each section outlines the steps you need to implement.

## 1. Environment Setup
- Import the libraries you plan to use (pandas, numpy, scikit-learn, matplotlib/seaborn for visuals, etc.).
- Configure any plotting defaults and random seeds for reproducibility.

## 2. Load the Dataset
- Point to `review_labels.txt` (gunzip first if needed).
- Use `pandas.read_csv` with tab separation and assign column names.
- Map `__label__1` to 0 (bad) and `__label__2` to 1 (good).

## 3. Exploratory Analysis
- Inspect class balance, text length distributions, and a few sample reviews.
- Optional: visualize label counts and token counts.

## 4. Text Preprocessing
- Decide on cleaning steps (lowercasing, punctuation removal, etc.).
- Implement any helpers needed for preprocessing.

## 5. Train/Validation/Test Split
- Split the dataset using `train_test_split` with stratification.
- Optionally carve off a validation set or implement cross-validation.

## 6. Feature Extraction
- Configure a `TfidfVectorizer` (consider n-gram range, min/max document frequency).
- Fit on the training data only and apply to splits.

## 7. Model Training
- Train a baseline classifier (Logistic Regression is a strong start).
- Optionally compare against Linear SVM or Naive Bayes.

## 8. Evaluation
- Compute accuracy, precision, recall, F1, and confusion matrix on validation/test data.
- Inspect misclassified examples to gain intuition.

## 9. Hyperparameter Tuning (Optional)
- Use `GridSearchCV` or `RandomizedSearchCV` to tune vectorizer/model settings.
- Record the best parameters and resulting metrics.

## 10. Final Model and Export
- Retrain on the full training data (train + validation if used).
- Save the pipeline with `joblib.dump`.
- Demonstrate loading the pipeline and running `predict_proba` or `predict` on new text.

## 11. Next Steps
- Note ideas such as transformer fine-tuning, handling class imbalance, or deploying via API.