# Tugas Praktikum

## Multiple Linear Regression

Identifikasi variabel-variabel yang akan digunakan sebagai variabel bebas (fitur) dan variabel target (biaya medis personal).

In [14]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
from sklearn.preprocessing import LabelEncoder, StandardScaler

In [15]:
# baca data dari file CSV
data = pd.read_csv('docs/insurance.csv')
data.head()

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,19,female,27.9,0,yes,southwest,16884.924
1,18,male,33.77,1,no,southeast,1725.5523
2,28,male,33.0,3,no,southeast,4449.462
3,33,male,22.705,0,no,northwest,21984.47061
4,32,male,28.88,0,no,northwest,3866.8552


In [16]:
# mengecek ukuran data
data.shape

(1338, 7)

In [17]:
# informasi tentang data
data.info()

# deskripsi data
data.describe()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1338 entries, 0 to 1337
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       1338 non-null   int64  
 1   sex       1338 non-null   object 
 2   bmi       1338 non-null   float64
 3   children  1338 non-null   int64  
 4   smoker    1338 non-null   object 
 5   region    1338 non-null   object 
 6   charges   1338 non-null   float64
dtypes: float64(2), int64(2), object(3)
memory usage: 73.3+ KB


Unnamed: 0,age,bmi,children,charges
count,1338.0,1338.0,1338.0,1338.0
mean,39.207025,30.663397,1.094918,13270.422265
std,14.04996,6.098187,1.205493,12110.011237
min,18.0,15.96,0.0,1121.8739
25%,27.0,26.29625,0.0,4740.28715
50%,39.0,30.4,1.0,9382.033
75%,51.0,34.69375,2.0,16639.912515
max,64.0,53.13,5.0,63770.42801


- Bagi dataset menjadi data latih (train) dan data uji (test) dengan proporsi yang sesuai.

In [18]:
# Proses Encoding
le = LabelEncoder() # membuat objek dari LabelEncoder
data['sex'] = le.fit_transform(data['sex']) # proses encoding
data['smoker'] = le.fit_transform(data['smoker']) # proses encoding
data['region'] = le.fit_transform(data['region']) # proses encoding
data.head()  # Menampilkan lima baris pertama dari DataFrame yang sudah dipilih kolomnya

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,19,0,27.9,0,1,3,16884.924
1,18,1,33.77,1,0,2,1725.5523
2,28,1,33.0,3,0,2,4449.462
3,33,1,22.705,0,0,1,21984.47061
4,32,1,28.88,0,0,1,3866.8552


In [19]:
X = data.drop(columns=["charges"])
y = data["charges"]

In [20]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f'X_test:\n{X_test}\n')
print(f'Y_test: {y_test}')

X_test:
      age  sex     bmi  children  smoker  region
764    45    0  25.175         2       0       0
887    36    0  30.020         0       0       1
890    64    0  26.885         0       1       1
1293   46    1  25.745         3       0       1
259    19    1  31.920         0       1       1
...   ...  ...     ...       ...     ...     ...
109    63    1  35.090         0       1       2
575    58    0  27.170         0       0       1
535    38    1  28.025         1       0       0
543    54    0  47.410         0       1       2
846    51    0  34.200         1       0       3

[268 rows x 6 columns]

Y_test: 764      9095.06825
887      5272.17580
890     29330.98315
1293     9301.89355
259     33750.29180
           ...     
109     47055.53210
575     12222.89830
535      6067.12675
543     63770.42801
846      9872.70100
Name: charges, Length: 268, dtype: float64


- Lakukan feature scaling jika diperlukan.

In [21]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

- Buat model multiple linear regression menggunakan Scikit-Learn.

In [22]:
# Buat model regresi linear
model = LinearRegression()

- Latih model pada data latih dan lakukan prediksi pada data uji.

In [23]:
# Latih model pada data latih
model.fit(X_train, y_train)

# Prediksi pada data uji
y_pred = model.predict(X_test)

- Evaluasi model dengan menghitung metrik seperti R-squared, MSE, dan MAE. Tampilkan hasil evaluasi.

In [24]:
# Evaluasi model
r_squared = r2_score(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)

# Menampilkan hasil evaluasi 
print(f'R-squared: {r_squared:2f}')
print(f'Mean Squared Error (MSE): {mse}')
print(f'Mean Absolute Error (MAE): {mae}')

R-squared: 0.783346
Mean Squared Error (MSE): 33635210.43117845
Mean Absolute Error (MAE): 4186.508898366439


In [25]:
# Menampilkan hasil prediksi
print("\nHasil Prediksi Biaya Medis Personal:")
for i in range(len(y_pred)):
    print(f"Data Uji {i + 1}: Prediksi {y_pred[i]:.2f}, Target {y_test.iloc[i]:.2f}")


Hasil Prediksi Biaya Medis Personal:
Data Uji 1: Prediksi 8924.41, Target 9095.07
Data Uji 2: Prediksi 7116.30, Target 5272.18
Data Uji 3: Prediksi 36909.01, Target 29330.98
Data Uji 4: Prediksi 9507.87, Target 9301.89
Data Uji 5: Prediksi 27013.35, Target 33750.29
Data Uji 6: Prediksi 10790.78, Target 4536.26
Data Uji 7: Prediksi 226.30, Target 2117.34
Data Uji 8: Prediksi 16942.72, Target 14210.54
Data Uji 9: Prediksi 1056.63, Target 3732.63
Data Uji 10: Prediksi 11267.92, Target 10264.44
Data Uji 11: Prediksi 28048.60, Target 18259.22
Data Uji 12: Prediksi 9424.36, Target 7256.72
Data Uji 13: Prediksi 5326.32, Target 3947.41
Data Uji 14: Prediksi 38460.06, Target 46151.12
Data Uji 15: Prediksi 40303.41, Target 48673.56
Data Uji 16: Prediksi 37147.01, Target 44202.65
Data Uji 17: Prediksi 15287.92, Target 9800.89
Data Uji 18: Prediksi 35965.05, Target 42969.85
Data Uji 19: Prediksi 9179.18, Target 8233.10
Data Uji 20: Prediksi 31510.83, Target 21774.32
Data Uji 21: Prediksi 3797.79,