# 미리 만들어진 에스티 메이터
- https://www.tensorflow.org/tutorials/estimator/premade?hl=ko

### tf.estimator: 고수준 TensorFlow API 
- 전체 모델에 대한 TensorFlow의 고수준 표현이며(a model-level abstraction) 손쉬운 확장 및 비동기 학습을 위해 설계
- TF에서 훈련 / 평가 / 에측 / 서빙내보내기를 encapsulate해서 만들어놓은 것

In [1]:
import tensorflow as tf 
import pandas as pd

CSV_COLUMN_NAMES = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species']
SPECIES = ['Setosa', 'Versicolor', 'Virginica']

In [2]:
train_path = tf.keras.utils.get_file( # local path, download url
    "iris_training.csv", "https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv")
test_path = tf.keras.utils.get_file(
    "iris_test.csv", "https://storage.googleapis.com/download.tensorflow.org/data/iris_test.csv")

Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv
Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/iris_test.csv


In [3]:
train = pd.read_csv(train_path, names=CSV_COLUMN_NAMES, header=0)
test = pd.read_csv(test_path, names=CSV_COLUMN_NAMES, header=0)

In [4]:
train.head()

Unnamed: 0,SepalLength,SepalWidth,PetalLength,PetalWidth,Species
0,6.4,2.8,5.6,2.2,2
1,5.0,2.3,3.3,1.0,1
2,4.9,2.5,4.5,1.7,2
3,4.9,3.1,1.5,0.1,0
4,5.7,3.8,1.7,0.3,0


In [6]:
train_y = train.pop('Species')
test_y = test.pop('Species')

- Estimator: tf.estimator.Estimator 에서 파생 된 모든 클래스
- tf.estimator 모음에서 일반적인 ML 알고리즘을 구현해놓음 (예 : LinearRegressor)
- 그 외에도 사용자 정의 Estimators를 작성할 수 있습니다. 


Estimator 이용한 TF 프로그램?
- 하나 이상의 입력 함수를 만듭니다.
- 모델의 특성 열을 정의합니다.
- 특성 열과 다양한 하이퍼 파라미터를 지정하여 에스티 메이터를 인스턴스화합니다.
- Estimator 개체에서 하나 이상의 메서드를 호출하여 적절한 입력 함수를 데이터 소스로 전달합니다.


입력 함수: tf.data.Dataset 객체를 반환하는 함수
- features: python dictionary
- label: 배열

In [8]:
def input_evaluation_set(): # 입력함수 형태 보여주는 예시함수 
    features = {'SepalLength': np.array([6.4, 5.0]),
                'SepalWidth':  np.array([2.8, 2.3]),
                'PetalLength': np.array([5.6, 3.3]),
                'PetalWidth':  np.array([2.2, 1.0])}
    labels = np.array([2, 1])
    return features, labels

모든 종류의 데이터를 위 형태로 파싱하는 TF의 API 이용해도 됨
- https://www.tensorflow.org/guide/data?hl=ko
- tf.data.Dataset: 병렬로 읽고 단일 스트림으로 결합 가능

In [9]:
# pandas로 데이터를로드하고 인-메모리 데이터에서 입력 파이프 라인을 빌드합니다.

def input_fn(features, labels, training=True, batch_size=256):
    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))
    if training:
        dataset = dataset.shuffle(1000).repeat()
    
    return dataset.batch(batch_size)

In [11]:
# feature dictionary의 raw input data를 사용하는 방법을 설명하는 개체
# tf.feature_column 모듈 

my_feature_columns = []
for k in train.keys():
    my_feature_columns.append(tf.feature_column.numeric_column(key=k))

In [16]:
# Instantiate esimator
# - 예시: tf.estimator.DNNClassifier, DNNLinearCombindClassifier, LinearClassifier
classifier = tf.estimator.DNNClassifier(
    feature_columns=my_feature_columns, # feature 특성
    hidden_units=[30, 10],
    n_classes=3 # 문제 정의
)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/var/folders/1c/xhkpty0142l7xwkrqtm3t9p80000gn/T/tmpljr47y3b', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


In [17]:
# estimator는 훈련, 평가, 예측 메서드를 포함

# 훈련
classifier.train(
    input_fn=lambda: input_fn(train, train_y, training=True),
    # 인-메모리 데이터에 배치로 처리된 입력 파이프라인
    steps=5000
)
# 인수를 사용하지 않는 입력 함수를 제공하면서 인수를 캡처하기 위해 input_fn 호출을 lambda 로 래핑합니다. ???
# steps 인수는 여러 훈련 단계 후에 훈련을 중지하는 방법을 알려줍니다.???? 

Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
INFO:tensorflow:Calling model_fn.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into /var/folders/1c/xhkpty0142l7xwkrqtm3t9p80000gn/T/tmpljr47y3b/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:loss = 1.3187697, step = 0
INFO:tensorflow:global_step/sec: 550.749
INFO:tensorflow:loss = 0.985477, step = 100 (0.182 sec)
INFO:tensorflow:global_step/sec: 748.846
INFO:tensorflow:loss = 0.90403473, step = 200

<tensorflow_estimator.python.estimator.canned.dnn.DNNClassifierV2 at 0x160560fd0>

In [18]:
# 평가
eval_result = classifier.evaluate(
    input_fn=lambda: input_fn(test, test_y, training=False)
)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2021-01-07T23:37:05Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /var/folders/1c/xhkpty0142l7xwkrqtm3t9p80000gn/T/tmpljr47y3b/model.ckpt-5000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Inference Time : 0.22893s
INFO:tensorflow:Finished evaluation at 2021-01-07-23:37:06
INFO:tensorflow:Saving dict for global step 5000: accuracy = 0.7, average_loss = 0.5410508, global_step = 5000, loss = 0.5410508
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 5000: /var/folders/1c/xhkpty0142l7xwkrqtm3t9p80000gn/T/tmpljr47y3b/model.ckpt-5000


In [39]:
# 예측
# Generate predictions from the model
expected = ['Setosa', 'Versicolor', 'Virginica']
predict_x = {
    'SepalLength': [5.1, 5.9, 6.9],
    'SepalWidth': [3.3, 3.0, 3.1],
    'PetalLength': [1.7, 4.2, 5.4],
    'PetalWidth': [0.5, 1.5, 2.1],
}


In [40]:
def input_fn(features, batch_size=256): # label이 없는 입력함수
    return tf.data.Dataset.from_tensor_slices(dict(features).batch(batch_size))

In [32]:

def input_fn(features, batch_size=256):
    """An input function for prediction."""
    # Convert the inputs to a Dataset without labels.
    return tf.data.Dataset.from_tensor_slices(dict(features)).batch(batch_size)

predictions = classifier.predict(
    input_fn=lambda: input_fn(predict_x))

In [33]:
predictions

<generator object Estimator.predict at 0x162699c50>

In [38]:
for pred_dict, expec in zip(predictions, expected):
    class_id = pred_dict['class_ids'][0]
    probability = pred_dict['probabilities'][class_id]

    print('Prediction is "{}" ({:.1f}%), expected "{}"'.format(
        SPECIES[class_id], 100 * probability, expec))

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /var/folders/1c/xhkpty0142l7xwkrqtm3t9p80000gn/T/tmpljr47y3b/model.ckpt-5000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Prediction is "Setosa" (82.2%), expected "Setosa"
Prediction is "Virginica" (46.8%), expected "Versicolor"
Prediction is "Virginica" (66.8%), expected "Virginica"
