## **🧩 Pemilihan Algoritma untuk Rekomendasi Berdasarkan Filter Custom**

🛠️ **Strategi**: Mekanisme ini menerima input berupa filter kustom dari pengguna seperti _Growth_, _Soil_, _Sunlight_, _Watering_, dan _Fertilization_. Input ini bersifat **deskriptif tekstual** sehingga direpresentasikan dalam bentuk **TF-IDF vektor**, mencerminkan hubungan semantik antar tanaman berdasarkan kesamaan deskripsi fitur.

🧠 **Keputusan**: Dipilih algoritma **Neural Network (MLP via TensorFlow)** karena:
- Mampu menangani **representasi TF-IDF** berdimensi tinggi tanpa kehilangan fleksibilitas.
- Dapat belajar **pola kompleks** dalam kombinasi fitur yang tak teratur dan sparsity.
- Lebih baik untuk pengembangan sistem dinamis dan dapat ditingkatkan dengan fine-tuning.

📊 **Alternatif seperti** Random Forest tidak optimal pada vektor sparse karena tiap pohon memperlakukan fitur secara terpisah dan bisa menghasilkan struktur berlebihan. KNN tidak cocok karena kesulitan menangani vektor berdimensi tinggi (curse of dimensionality), serta berat saat prediksi (lazy learner).

---

### 📋 Tabel Perbandingan Algoritma

| Aspek Evaluasi                  | ✅ MLP (TensorFlow)          | 🟡 Random Forest               | 🔴 K-Nearest Neighbors (KNN)    |
|----------------------------------|-------------------------------|--------------------------------|----------------------------------|
| Cocok untuk TF-IDF Input        | ✔️ Sangat Baik               | ❌ Kurang efisien              | ❌ Tidak efisien                |
| Tangani Data Sparsed            | ✔️ Ya                        | ❌ Cenderung tidak efisien     | ❌ Tidak cocok                  |
| Kompatibel dengan Dimensi Tinggi| ✔️ Didesain untuk itu        | ❌ Bisa overfitting            | ❌ Sangat terbatas              |
| Multiclass Support              | ✔️ Native Support            | ✔️ Ya, tapi berat              | ⚠️ Kurang scalable             |
| Potensi Pengembangan Model      | ✔️ Bisa ditingkatkan terus   | ⚠️ Terbatas                    | ❌ Tidak bisa dilatih ulang    |

---

📌 **Kesimpulan**: Neural Network (MLP via TensorFlow) adalah pilihan paling adaptif dan scalable untuk menangani mekanisme berbasis input tekstual dan vektor sparsity tinggi seperti TF-IDF, terutama dalam sistem rekomendasi dinamis berbasis filter pengguna.

## **1️⃣ Install & Import Library**

In [None]:
!pip install gdown scikit-learn tensorflow --quiet

import pandas as pd
import numpy as np
import gdown
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.utils import to_categorical

## **2️⃣ Download & Load Dataset PlansAndFamily**

In [None]:
# 2️⃣ Download & Load Dataset PlansAndFamily
url = "https://drive.google.com/uc?id=1nXnJKn-3TCXBW3URPzmRc7WH3cYQQ6Fe"
gdown.download(url, "PlansAndFamily.csv", quiet=False)
df = pd.read_csv("PlansAndFamily.csv")

Downloading...
From: https://drive.google.com/uc?id=1nXnJKn-3TCXBW3URPzmRc7WH3cYQQ6Fe
To: /content/PlansAndFamily.csv
100%|██████████| 50.0k/50.0k [00:00<00:00, 72.2MB/s]


## **3️⃣ Preprocessing Fitur Gabungan**

In [None]:
df = df.dropna(subset=["Plant Name"])
filter_cols = ["Growth", "Soil", "Sunlight", "Watering", "Fertilization Type"]
df['combined'] = df[filter_cols].fillna("").agg(' '.join, axis=1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['combined'] = df[filter_cols].fillna("").agg(' '.join, axis=1)


## **4️⃣ Encode Label Target**

In [None]:
le = LabelEncoder()
df['label'] = le.fit_transform(df['Plant Name'])

## **5️⃣ TF-IDF Vectorization**

In [None]:
tfidf = TfidfVectorizer()
X = tfidf.fit_transform(df['combined']).toarray()
y = to_categorical(df['label'])  # TensorFlow expects one-hot encoding

## **6️⃣ Train 📦 TensorFlow Classifier (MLP)**

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = Sequential([
    Input(shape=(X.shape[1],)),
    Dense(128, activation='relu'),
    Dense(64, activation='relu'),
    Dense(y.shape[1], activation='softmax')
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=10, batch_size=16, validation_split=0.1, verbose=0)

<keras.src.callbacks.history.History at 0x785f8831e010>

## **7️⃣ Fungsi Prediksi Berdasarkan Filter User**

In [None]:
recommendation_history_custom = set()

def recommend_plants_by_filter(user_filters: dict, top_n=3, reset=False):
    global recommendation_history_custom

    if reset:
        recommendation_history_custom = set()

    selected_features = [v for v in user_filters.values() if v.lower() != 'none']
    if not selected_features:
        return ["Tidak ada filter yang dipilih"]

    user_query = ' '.join(selected_features)
    user_vector = tfidf.transform([user_query]).toarray()

    probs = model.predict(user_vector, verbose=0)[0]
    top_indices = probs.argsort()[::-1]  # descending order
    predictions = [(le.inverse_transform([idx])[0], probs[idx]) for idx in top_indices]

    top_plants = []
    for name, _ in predictions:
        if name not in recommendation_history_custom:
            top_plants.append(name)
            recommendation_history_custom.add(name)
        if len(top_plants) == top_n:
            break

    return df[df['Plant Name'].isin(top_plants)][['Plant Name'] + filter_cols].drop_duplicates('Plant Name')

## **8️⃣ Contoh Penggunaan**

In [None]:
user_filters = {
    "Growth": "none",
    "Soil": "well-drained",
    "Sunlight": "full sunlight",
    "Watering": "Keep soil evenly moist",
    "Fertilization Type": "Organic"
}
recommend_plants_by_filter(user_filters, reset=False)

Unnamed: 0,Plant Name,Growth,Soil,Sunlight,Watering,Fertilization Type
1,Basil,fast,well-drained,full sunlight,Keep soil evenly moist,Organic
42,Pansy,moderate,well-drained,partial sunlight,Keep soil evenly moist,Organic
91,Lobelia,fast,well-drained,indirect sunlight,Water weekly,Low-nitrogen


## **9️⃣ Simpan Model & Encoder ke File .h5**

In [None]:
# model.save("RekomendasibyCustom_model.h5")

# # Simpan TF-IDF dan LabelEncoder
# import pickle

# with open("tfidf_vectorizer.pkl", "wb") as f:
#     pickle.dump(tfidf, f)

# with open("label_encoder.pkl", "wb") as f:
#     pickle.dump(le, f)