**Insurance Pridiction Model (ML)**

---



We built a Linear Regression model to predict medical insurance cost.
Categorical features were converted using one-hot encoding.
The model was trained using 80% data and tested on 20%.
Performance was evaluated using MAE and R² score.
The model achieved approximately 78% accuracy.
Finally, a real-world prediction was tested using manual input.

**Import Libraries** *italicized text*

In [126]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, r2_score

**Load Dataset**

In [118]:
df = pd.read_csv("/content/insurance.csv")

**Encode categorical columns**

In [119]:
# This automatically converts text to 0/1 columns
df = pd.get_dummies(df, columns=["sex", "smoker", "region"], drop_first=True)

**Select Features and target**

In [120]:
X = df.drop("charges", axis=1)
y = df["charges"]

**Train Test Split**

In [121]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

**Train The Model**

In [122]:
model = LinearRegression()
model.fit(X_train, y_train)

**Prediction**

In [127]:
predictions = model.predict(X_test)

**Evaluation**

In [125]:
mae = mean_absolute_error(y_test, predictions)
r2 = r2_score(y_test, predictions)

print("Mean Absolute Error:", round(mae, 2))
print("R2 Score:", round(r2, 2))

Mean Absolute Error: 4181.19
R2 Score: 0.78


**Manual test**

In [128]:
new_person = pd.DataFrame({
    "age": [31],
    "bmi": [45],
    "children": [2],
    "sex_male": [1],
    "smoker_yes": [1],
    "region_northwest": [0],
    "region_southeast": [1],
    "region_southwest": [0]
})

predicted_cost = model.predict(new_person)
print("Predicted Medical Cost:", round(predicted_cost[0], 2))

Predicted Medical Cost: 35029.42
