- This project focuses on exploring various data preparation techniques and their impact on the performance of a logistic regression model for predicting wine quality. By applying different transformations and feature selection strategies, the project aims to uncover insights into the complex relationship between wine characteristics and quality.
- The dataset used in this project includes physicochemical properties of wines, such as acidity, sugar level, and alcohol content, along with a quality rating. The objective is to predict the quality score based on these properties.
- The project follows a systematic approach, divided into two main phases:
- Data Transformation Phase: Different transformations like logarithmic, square root, and Box-Cox are applied to key features to observe their effect on model accuracy.
- Feature Selection Phase: Various combinations of features are selectively dropped from the dataset to identify their impact on the predictive power of the model.
- Minor improvements in model accuracy were observed with specific data transformations, indicating the potential of addressing feature skewness.
- The feature selection phase highlighted the importance of certain features over others and how their exclusion affects model performance.
- Python
- Libraries: Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn
- Clone the repository.
- Ensure Python and the required libraries are installed.
- Run the Jupyter Notebook to see the analysis and model building process.
- The project emphasizes the nuanced role of individual features and transformations in predictive modeling and provides a foundation for further research and exploration in wine quality prediction using machine learning techniques.