# NN Model for TCM Project (pseudo code)

## Steps of CNN

2. 文本處理

    文本向量化：將文本數據轉換為數值表示形式，比如將中文文字轉換成詞向量或字符向量。
    填充序列：確保所有文本序列的長度相同，如果需要，進行填充或截斷操作。

3. 建立CNN模型

    卷積層與池化層：設計卷積層和池化層來提取特徵並減少輸入的維度。
    Flatten層與全連接層：將卷積池化後的特徵展平成向量，然後添加全連接層進行分類。

4. 模型訓練與評估

    編譯模型：選擇損失函數、優化器和評估指標。
    訓練模型：使用已處理的數據訓練CNN模型，通常通過反向傳播算法來更新權重。
    評估模型：使用測試集來評估模型的性能，看準確度或其他指標。

5. 模型調整和優化

    超參數調整：調整卷積層的數量、大小，池化層的選擇等，以提高模型性能。
    防止過擬合：使用正則化方法，如dropout層，以防止模型在訓練集上表現好但在測試集上表現差。

In [3]:
import sys, os
print(os.path.dirname(sys.executable))



c:\Users\taliah\miniconda3\envs\ml2


In [6]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from sklearn.preprocessing import LabelEncoder
from sklearn.multioutput import MultiOutputClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import matplotlib.pyplot as plt

In [7]:
def ReadData(FILENAME):
    data = pd.read_csv(FILENAME)
    # Debug
    print("ReadData:")
    print(f'Shape of data = ({data.shape[0]} rows, {data.shape[1]} cols).')
    return data

# 2. Change the columns with texts into numeric values using LabelEncoder.
def TextConvert(data):
   # Body status: 1~3, Diagnosis: 4~7
    label_encoder = LabelEncoder()
    categorical_columns = list(range(1, 8))
    for i in categorical_columns:
        data.iloc[:, i] = label_encoder.fit_transform(data.iloc[:, i])
    return data


# 3. Split the data into the status/diagnoses/symptoms and the prescriptions.
def SplitXY(data):
    # Body status: 1~3, Diagnosis: 4~7, Symptom: 11~124
    # Prescription: 125~226
    split_X = list(range(1, 8)) + list(range(11, 126))
    split_Y = list(range(125, 227))
    X = data.iloc[1:, split_X]
    y = data.iloc[1:, split_Y]
    # Debug
    print("SplitXY:")
    print(f'Shape of X = ({X.shape[0]} rows, {X.shape[1]} cols). First 10 data of X:')
    print(X.iloc[:10, :10])
    print(f'Shape of y = ({y.shape[0]} rows, {y.shape[1]} cols). First 10 data of y:')
    print(y.iloc[:10, :10])
    return X, y


# 4. Delete the CHMs that are not often used enough.
def DeleteMedicine(y):
    threshold = 10
    for col in y.columns:
        if y[col].sum() < threshold:
            y = y.drop(col, axis=1)
            

# 5. Split the data into training data and validation data.
def SplitTrainValid(X, y):
    state = 114514
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=state, stratify=y)
    # Debug
    print("SplitTrainValid:")
    print(f'shape of X_train is ({X_train.shape[0]}, {X_train.shape[1]}).')
    print(f'shape of X_test is ({X_test.shape[0]}, {X_test.shape[1]}).')
    print(f'shape of y_train is ({y_train.shape[0]}, {y_train.shape[1]}).')
    print(f'shape of y_test is ({y_test.shape[0]}, {y_test.shape[1]}).')
    
    return X_train, X_test, y_train, y_test

In [9]:


# Step 2 functions
FILENAME = './process_data.csv'
data = ReadData(FILENAME)
data = TextConvert(data)


ReadData:
Shape of data = (797 rows, 227 cols).


In [None]:
# 創建序列模型
model = Sequential()

# 添加卷積層和池化層
model.add(Conv2D(
    filters=32, 
    kernel_size=(3, 3), 
    activation='relu', 
    input_shape=(height, width, channels)))

model.add(MaxPooling2D((2, 2)))

# 添加更多卷積層和池化層（根據需要）
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))

#Drop掉一定比例的神經元來避免Overfit的狀況
model.add(Dropout(0.25))

# 將特徵展平成一維向量
model.add(Flatten())

# 添加全連接層
model.add(Dense(128, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))  # num_classes是你的分類數量


In [None]:
# 訓練模型

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# parameter to tune:
# optimizer
# Optimizer Selection
# loss
# Loss Function: The 'loss' argument defines the loss function used to compute the error between predicted and actual values
# metrics 
# specifies the evaluation metrics used to monitor the model's performance


In [None]:
# 顯示模型摘要
model.summary()
train_history = model.fit(......)

#plot graph
show_train_history(.....)