## Machine Learning Modeling of Concrete Compressive Strength

Dataset available on https://archive-beta.ics.uci.edu/dataset/165/concrete+compressive+strength, License is [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)

This project consists in developing a machine learning regression model of the concrete compressive strength as a nonlinear function of its ingredients.  It is a typical ML dataset found in the literature.

This dataset contains 1030 instances/datapoints, 9 features, including Cement, Blast Furnace Slag, Fly Ash, Water, Superplasticizer, Coarse and Fine Aggregates and concrete Age. The concrete compressive strength is our target/label.


## Description of Variables and Dataset

**Inputs (features):**  
- Cement (kg in a m³ mixture)  
- Blast Furnace Slag (kg in a m³ mixture)  
- Fly Ash (kg in a m³ mixture)  
- Water (kg in a m³ mixture)  
- Superplasticizer (kg in a m³ mixture)  
- Coarse Aggregate (kg in a m³ mixture)  
- Fine Aggregate (kg in a m³ mixture)  
- Age (days) — curing age of the concrete sample  

**Output (target / label):**  
- Compressive Strength (MPa) — concrete compressive strength at given age  

**Data file:**  CSV from UCI dataset

All units follow the dataset as provided. For definitions and context on compressive strength and mixture design, please see the [UCI dataset page](https://archive-beta.ics.uci.edu/dataset/165/concrete+compressive+strength).  

Using the provided dataset, build three different ML regression models to predict **Compressive Strength (MPa)** from the given input features. Use **scikit-learn** and machine learning models of your own choosing.  

You should:  

1. **Formalize** the ML problem (features, target, assumptions).  
2. **Load and inspect** the dataset; perform basic analysis of the data (distributions, correlations, unit checks).  
3. Define and justify your **data split** (train/test).  
4. Build a **pipeline** (preprocessing + regressor).  
5. **Train/validate** at your models (you may try multiple algorithms, but report clearly on your final choice).  
6. **Evaluate** with MAE, RMSE, and $R^2$ on the held-out test set; include residual and predicted-vs-true plots.  
7. Provide a **short discussion**: model choice rationale, performance, physical sanity-checks (e.g., does higher cement content generally increase strength?), limitations.  




**Note: Feel free to remove non-continuous features (integers) such as age and blast furnace slag.**