# Quick Start and Config the Training


## Quick Start

### 1. Download Operator

In [None]:
! git clone https://towhee.io/towhee/resnet-image-embedding.git
! cd resnet-image-embedding
! ls

Then run Python scripts in following steps to train and test a Towhee operator.

### 2. Setup Operator

Create operator and load model by name.

In [None]:
from resnet_image_embedding import ResnetImageEmbedding
from towhee.trainer.training_config import TrainingConfig
from torchvision import transforms
from towhee import dataset
op = ResnetImageEmbedding('resnet18', num_classes=10)

### 3. Configure Trainer:

Modify training configurations on top of default values.

In [None]:
# build a training config:
training_config = TrainingConfig(
    batch_size=2,
    epoch_num=2,
    output_dir='quick_start_output'

### 4. Prepare Dataset

The example here uses a fake dataset for both training and evaluation.

In [None]:
# prepare the dataset
fake_transform = transforms.Compose([transforms.ToTensor()])
train_data = dataset('fake', size=20, transform=fake_transform)
eval_data = dataset('fake', size=10, transform=fake_transform)

### 5. Start Training

Now everything is ready, start training.

In [None]:
op.train(
    training_config,
    train_dataset=train_data,
    eval_dataset=eval_data
)

With a successful training, you will see progress bar below and a `quick_start_output` folder containing training results.

## Set Training

### 1. Default Configs
You can dump default training configs or write customized training configs to a yaml file.

In [None]:
from towhee.trainer.training_config import dump_default_yaml, TrainingConfig
default_config_file = 'default_training_configs.yaml'
dump_default_yaml(default_config_file)

You can open default_training_configs.yaml, and you can get the default config yaml structure like this:
```yaml
train:
    output_dir: ./output_dir
    overwrite_output_dir: true
    eval_strategy: epoch
    eval_steps:
    batch_size: 8
    val_batch_size: -1
    seed: 42
    epoch_num: 2
    dataloader_pin_memory: true
    dataloader_drop_last: true
    dataloader_num_workers: 0
    load_best_model_at_end: false
    freeze_bn: false
device:
    device_str:
    sync_bn: false
logging:
    print_steps:
learning:
    lr: 5e-05
    loss: CrossEntropyLoss
    optimizer: Adam
    lr_scheduler_type: linear
    warmup_ratio: 0.0
    warmup_steps: 0
callback:
    early_stopping:
        monitor: eval_epoch_metric
        patience: 4
        mode: max
    model_checkpoint:
        every_n_epoch: 1
    tensorboard:
        log_dir:
        comment: ''
metrics:
    metric: Accuracy
```
So the yaml file is corresponding to the TrainingConfig instance.

In [None]:
training_configs = TrainingConfig().load_from_yaml(default_config_file)
print(training_configs)
training_configs.output_dir = 'my_test_output'
training_configs.save_to_yaml('my_test_config.yaml')

Open my_test_config.yaml, and you will find `output_dir` is modified:
```yaml
train:
    output_dir: my_test_output
```
So there are 2 ways to set up the configs. One is using by class `TrainingConfig`, another is to overwrite the yaml file.

### 2.Setting by TrainingConfig
It's easy to set config using the TrainingConfig class. Just set the fields in TrainingConfig instance.
You can get each config field introduction easily by `get_config_help()`.

In [None]:
from towhee.trainer.training_config import get_config_help
help_dict = get_config_help() # get config field introductions.

You can construct config by the construct function, or then modify you custom value.

In [None]:
training_configs = TrainingConfig(
    xxx='some_value_xxx',
    yyy='some_value_yyy'
)
# or
training_configs.aaa='some_value_aaa'
training_configs.bbb='some_value_bbb'

### 3.Setting by yaml file

Your yaml file can be briefly with just some lines. You need not write the whole setting.
```yaml
train:
    output_dir: my_another_output
```
A yaml like this also works. Default values will be overwritten if not written.
There are some point you should pay attention.
- If a value is None in python, no value is required after the colon.
- If the value is `True`/`False` in python, it's `true`/`false` in yaml.
- If the field is `str` instance in python, no quotation marks required.
- If the field value is `dict` instance in python, start another line after the colon, each line after that is each key-value pair info.
```yaml
    early_stopping:
        monitor: eval_epoch_metric
        patience: 4
        mode: max
```
equals
```python
early_stopping = {
    'monitor': 'eval_epoch_metric',
    'patience': 4,
    'mode': 'max'
    }
```
in python.