 ## Generating Training Datasets
 To generate training datasets from Excel files, use the `generate_training_dataset_from_excel` function and provide the file path, sheet name, and column name for the amino acid sequence. For CSV files, use the                  `generate_training_dataset_from_csv` function and provide the file path and column name for the sequence.

In [None]:
from integrated import generate_training_dataset_from_csv, generate_training_dataset_from_excel

name_list = ["HC", "LC"]

for name in name_list:
    for i in range(3):
        generate_training_dataset_from_excel(
            "antibody_A/{}.xlsx".format(name),
            "AAsequence",
            "antibody_A/{}_{}_training_datasets".format(name, i),
            sheet_name=i
        )

name_list = ['antibody_B', 'antibody_C', 'antibody_D', 'antibody_E']

for name in name_list:
    generate_training_dataset_from_csv(
        "PATH/{}.csv".format(name),
        "AAsequence",
        "PATH/{}_training_datasets".format(name)
    )


## Creating Antibody Objects from Training Datasets

To create antibody objects from training datasets, use the 'AntibodyCls' class. Specify the training dataset path, mutation count column, bin number (if applicable), and name for each antibody object.

In [None]:
from integrated import AntibodyCls

antibody = {}

name_list = ["HC_0", "HC_1", "HC_2", "LC_0", "LC_1", "LC_2"]

for name in name_list:
    antibody['antibody_A_'+name] = AntibodyCls(
        "antibody_A/{}_training_datasets".format(name),
        mutation_count_column='mutation_count',
        name=name
    )

    antibody['antibody_A_'+name].generate_training_array()

name_list = ['antibody_B', 'antibody_C', 'antibody_D', 'antibody_E']

for name in name_list:
    antibody['antibody_F_'+name] = AntibodyCls(
        "antibody_F/{}_training_datasets".format(name),
        mutation_count_column='mutation_count',
        bin_num=4,
        name=name
    )

    antibody['antibody_F_'+name].generate_training_array()

 ## Training Models

  To train a random forest model without hyperparameter tuning, use the `train_rf` function and provide a list of antibody objects. To train a random forest model with hyperparameter tuning, pass `tune_hyper=True` and specify the parameters to be tuned using a dictionary.

In [None]:
from integrated import train_rf, train_NN

train_name_list = ['antibody_A_HC_0', 'antibody_A_HC_1', 'antibody_A_HC_2', 'antibody_A_LC_0',
                'antibody_A_LC_1', 'antibody_A_LC_2', 'antibody_B', 'antibody_C']
antibody_train_list = [antibody[key] for key in train_name_list]

test_name_list = ['antibody_D', 'antibody_E']
antibody_test_list = [antibody[key] for key in test_name_list]

In [None]:
clf = train_rf(antibody_train_list)

In [None]:
parameters = {'max_depth': [10, 20, 50]}
clf_ht = train_rf(
    antibody_train_list,
    tune_hyper=True,
    parameters=parameters
)

To train a neural network model, use the `train_NN` function and provide a list of antibody objects, a record name, batch size, learning rate, and neural network architecture. The function saves training metrics, tensorboard logs, checkpoints, and models to corresponding folders.

In [None]:
for N_nodes in [512]:
    train_NN(
        antibody_list=antibody_train_list,
        record_name='{}_lr.003_batch10000'.format(N_nodes),
        batch_size=10000,
        lr=.003,
        NN_architecture=[1280, N_nodes, 2],
        num_shuffle=1
    )

 ## Testing Models
 
To test a trained model, use the `test_rf` or `test_NN` function and provide either the trained classifier or the file path to the trained classifier, along with a list of antibody objects to test. `test_rf` outputs the F1 score for a random forest model, while `test_NN` outputs the F1 score, precision, and recall for a neural network model.

In [None]:
from integrated import test_rf, test_NN

test_rf(clf, antibody_test_list)

In [None]:
test_NN('512_lr.003_batch10000_r1.pth', antibody_test_list)

## Conclusion
 
 In this notebook, we generated training datasets from Excel and CSV files using the `generate_training_dataset_from_excel` and                                `generate_training_dataset_from_csv` functions. We then created antibody objects from these datasets using the `AntibodyCls` class and trained random forest  and neural network models using the `train_rf` and `train_NN` functions. Finally, we tested these models using the `test_rf` and `test_NN` functions.