# basic
---

# pandas
---
```python
# 读取csv数据，设置列名，并将指定列弹出
train = pd.read_csv(train_path, names=[], header=0) # return a DataFrame
train_x, train_y = train, train.pop([])
```



# Tensors
---
- tf.Variable
- tf.constant
- tf.placeholder
- tf.SparseTensor

> Rank

- rank0: 标量
- rank1: list
- rank2: ....

# Variable
---
tf.Variable保存了一个持久化的tensor，ops可以读取和修改这个tensor的值。

> ## 1 Creating a Variable

最好的方法是通过`tf.get_variable`来创建和使用variable，这个函数需要指定一个名字。
```python
my_variable = tf.get_variable('my_variable', [1, 2, 3])

my_int_variable = tf.get_variable('my_int_variable', [1, 2, 3],
                       dtype=tf.int32,
                       initializer=tf.zeros_initializer)
```

> ## 2 Variable collections

`tf.GraphKeys`

不训练的变量可以加入到`tf.GraphKeys.LOCAL_VARIABLES`
```python
my_local = tf.get_variable('my_local', shape=[], collections=[tf.GraphKeys.LOCAL_VARIABLES])
# or
my_non_trainable = tf.get_variable('my_non_trainable', shape=(), trainable=False)

tf.add_to_collection('xx', my_local)

tf.get_collection('xx')
```

> ## 3 Device placement

```python
with tf.device('/device:GPU:1'):
    v = tf.get_variable('v', [1])

cluster_spec = {
    "ps": ["ps0:2222", "ps1:2222"],
    "worker": ["worker0:2222", "worker1:2222", "worker2:2222"]}
with tf.device(tf.train.replica_device_setter(cluster=cluster_spec)):
  v = tf.get_variable("v", shape=[20, 20])  # this variable is placed
                                            # in the parameter server
                                            # by the replica_device_setter
```

> ## 4 Initializing variable

```python
tf.global_variables_initializer()
```
> ## 5 Using variables

> ## 6 Sharing variables

```python
with tf.variable_scope("model"):
  output1 = my_image_filter(input1)
with tf.variable_scope("model", reuse=True):
  output2 = my_image_filter(input2)
  
with tf.variable_scope("model") as scope:
  output1 = my_image_filter(input1)
  scope.reuse_variables()
  output2 = my_image_filter(input2)
  
with tf.variable_scope("model") as scope:
  output1 = my_image_filter(input1)
with tf.variable_scope(scope, reuse=True):
  output2 = my_image_filter(input2)
```

# Graph
---
> ## 1 Dataflow

tensorflow 使用dataflow来进行模型计算。dataflow的优势主要有
- 并行处理
- 分布式
- 预编译，数据处理速度更快
- 可保存

> ## 2 tf.Graph

1. Graph structure
    
    tensorflow所有的操作步骤都记录在graph上，graph主要包含两个部分
    - tf.Opearteion:也称ops，graph上的所有节点都是一个ops，他记录了操作行为，如加减乘除等。
    - tf.Tensor：graph上的边都是一个tensor，表示了节点计算所需要的值。

    **tensor**并不记录值，只是一个计算图的组成部分，只记录了值的类型和shape。以及值的引用。
2. Graph collections
    - tensorflow定义了一些集合用来存储模型的临时数据，`tf.add_to_collection`可以将值加入到指定的集合(已定义的集合名有`tf.GraphKeys`),`tf.get_collection`获取集合中的值。

> ## 3 building a graph

- `tf.constant(32)`创建一个用于生成32的ops，并加入到默认graph中，返回一个tensor
- `tf.matmul(x, y)`创建一个两个tensor相乘的ops，加入到默认graph中，return相乘结果。
- `v = tf.Variable(0)`创建一个可修改值的tensor到graph中。
- `tf.train.Optimizer.minimizer`会把tensors和operations加入到graph用来计算梯度，并返回一个ops，可以用于更新variable

通常我们也会创建不同的graph用于train和eval。

> ## 4 Nameing operations

Graph为ops定义了**namespace**。主要有两种方法为ops定义name：
- `tf.constant(2, name='test')`创建一个名为test的用于生成2的ops，"test:0"
- `tf.name_scope('xxx')`用于为该context下所有ops加入一个命名前缀

**如果有重名的ops定义，为保证命名唯一，会在名字后加入'_1','_2'...**

**命名会在使用tensorboard的时候，简化阅读**

> ## 5 placing operations on different devices

`tf.device`用于指定ops在哪个device上运行。

如果在单个cpu和gpu上使用
```python
weights = tf.random_normal(...)

with tf.device('/device:CPU:0'):
    img = tf.decode_jpg('xx')
with tf.device('/device:GPU:0'):
    result = tf.matmull(weights, img)
```
如果使用了分布式，要定义job name and task ID，,Variable指定给job/ps, ops指定给job/worker
```python
with tf.device('/job:ps/task:0'):
    weights_1 = tf.Variable(..)
    biases_1 = tf.Variable(...)

with tf.device('/job:ps/task:1'):
    weights_1 = tf.Variable(...)
    biases_1 = tf.Variable(...)
    
with tf.device('/jpb:worker'):
    layer_1 = tf.matmul(train_batch, weights_1) + biases_1
    layer_2 = tf.matmul(train_batch, weights_2) + biases_2
```
也可以使用简单的方法`tf.train.replica_device_setter()`
```python
with tf.device(tf.train.replica_device_setter(ps_tasks=3)):
  # tf.Variable objects are, by default, placed on tasks in "/job:ps" in a
  # round-robin fashion.
  w_0 = tf.Variable(...)  # placed on "/job:ps/task:0"
  b_0 = tf.Variable(...)  # placed on "/job:ps/task:1"
  w_1 = tf.Variable(...)  # placed on "/job:ps/task:2"
  b_1 = tf.Variable(...)  # placed on "/job:ps/task:0"

  input_data = tf.placeholder(tf.float32)     # placed on "/job:worker"
  layer_0 = tf.matmul(input_data, w_0) + b_0  # placed on "/job:worker"
  layer_1 = tf.matmul(layer_0, w_1) + b_1     # placed on "/job:worker"
```

> ## 6 tensor-like objects

tensor是tensorflow的基本执行单元，为方便操作，一些tensor-like也可以作为tensor使用。
- tf.Tensor
- tf.Variable
- numpy.ndarray
- list
- scalar python type:bool, float, int, str

每次使用tensor-like会默认创建一个tensor object。如果tensor-like太大，需要使用`tf.convert_to_tensor` 来避免内存溢出。

> ## 7 Execution a graph in a tf.Session

1. Creating a tf.Session

```python
# create a default in-process session
with tf.Session() as sess:
    ...

# create a remote session
with tf.Session('grpc://example.org:2222'):
    ...
```

在一些高级API中如`tf.train.MonitoredTrainingSession`or`tf.estimator.Estimator`会自动创建和管理一个session。这些API接收一些参数操作sess
- `target`如果为空则表示使用本地device，可以指定一个远程server的'grpc://'URL。
- `graph`默认会使用当前graph，如果使用多个graph，也可特别指定。
- `config`接收一个`tf.ConfigProto`来控制session，ConfigProto主要 有以下参数：
    - `allow_soft_placement`:如果设为True，则会有Gpu时使用gpu，没有则用cpu
    - `cluster_def`:在分布式处理时，能显示其他机器的运行情况
    - `graph_option.optimizer_options`: 指定optimization
    - `gpu_options.allow_growth`:设为True，gpu的内存会可变。
    
2. Useing tf.Session.run to execute operations

`sess.run`主要用于running a ops or eval a tensor.

> ## 8 Visualizing your graph

```python
# Build your graph.
x = tf.constant([[37.0, -23.0], [1.0, 4.0]])
w = tf.Variable(tf.random_uniform([2, 2]))
y = tf.matmul(x, w)
# ...
loss = ...
train_op = tf.train.AdagradOptimizer(0.01).minimize(loss)

with tf.Session() as sess:
  # `sess.graph` provides access to the graph used in a `tf.Session`.
  writer = tf.summary.FileWriter("/tmp/log/...", sess.graph)

  # Perform your computation...
  for i in range(1000):
    sess.run(train_op)
    # ...

  writer.close()
```

> ## 9 Programming with multiple graphs



# TesnsorBoard
---
图形化显示graph,
```python
a = tf.constant(2.0)
b = tf.constant(3.0)
retulst = a + b
writer = tf.summary.FileWriter('.')
writer.add_graph(tf.get_default_graph())
# 终端运行 tensorboard --logdir=path
# path为当前路径
```

# Session
---
tensorflow中的graph就类似于python中的.py文件,tf.Session就类似于python，用于执行graph。
```python
sess = tf.Session()

with tf.Session() as sess:
    ...
```

# Feeding
---
为了减少在模型训练中tensor的多次创建。tf.placeholder用于创建一个占位，类似于函数的参数，只需要的在使用的时候传进去就可以。
```python
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
z = x + y

with tf.Session() as sess:
    print(sess.run(z, feed_dict={x:3, y:4.5}))
    print(sess.run(z, feed_dict={x:[3, 4], y:[4, 5]}))
```
**创建placeholder后，一定要在执行的过程中使用feed_dict传值。**

In [3]:
a = tf.constant(2.0)
b = tf.constant(3.0)
retulst = a + b

# Dataset
---
`tf.data`模块主要用于加载数据，预处理和传输到模型中。可以从`numpy.arrays`和csv文件中读取数据。想要使用dataset中的tensor需要先创建一个**iterator**用`dataset.make_one_shot_iterator()`方法，然后使用`get_next()`获取数据。如果get_next()获取不到数据，会抛出`OutOfRangeError`错误。

- `Dataset` - Base class
- `TextLineDataset` - Reads lines from text file.
- `TFRecordDataset` - Reads records from TFRecord files.
- `FixedLengthRecordDataset` - Reads fixed size record from binary files.
- `Iterator`

> ## 1 Basic mechanics

1. define a source, build a **Dataset**.use `tf.data.Dataset.from_tensors()` or `tf.data.Dataset.from_tensor_slices()` or `tf.data.TFRecordDataset()`
2. transform into a new dataset by chaining method. `Dataset.map()` and `Dataset.batch()`
3. make an **iterator**.`Dataset.make_one_shot_iterator()`需要使用`Iterator.initializer()`对iterator进行初始化，`Iterator.get_next()`来取值。

> ## 2 Dataset structure

每个dataset有`Dataset.output_types` and `Dataset.output_shapes`两个属性。
```python
dataset1 = tf.data.Dataset.from_tensor_slices(tf.random_uniform([4, 10]))
print(dataset1.output_types)  # ==> "tf.float32"
print(dataset1.output_shapes)  # ==> "(10,)"

dataset2 = tf.data.Dataset.from_tensor_slices(
   (tf.random_uniform([4]),
    tf.random_uniform([4, 100], maxval=100, dtype=tf.int32)))
print(dataset2.output_types)  # ==> "(tf.float32, tf.int32)"
print(dataset2.output_shapes)  # ==> "((), (100,))"

dataset3 = tf.data.Dataset.zip((dataset1, dataset2))
print(dataset3.output_types)  # ==> (tf.float32, (tf.float32, tf.int32))
print(dataset3.output_shapes)  # ==> "(10, ((), (100,)))"
```
使用dict
```python
dataset = tf.data.Dataset.from_tensor_slices(
   {"a": tf.random_uniform([4]),
    "b": tf.random_uniform([4, 100], maxval=100, dtype=tf.int32)})
print(dataset.output_types)  # ==> "{'a': tf.float32, 'b': tf.int32}"
print(dataset.output_shapes)  # ==> "{'a': (), 'b': (100,)}"
```
使用map
```python
dataset1 = dataset1.map(lambda x: ...)

dataset2 = dataset2.flat_map(lambda x, y: ...)

# Note: Argument destructuring is not available in Python 3.
dataset3 = dataset3.filter(lambda x, (y, z): ...)
```

> ## 3 Creating an iterator

```python
dataset = tf.data.Dataset.range(100)
iterator = dataset.make_one_shot_iterator()
next_element = iterator.get_next()

for i in range(100):
    value = sess.run(next_element)
    assert i == value
    
######initializable,可用于带参迭代#########
max_value = tf.placeholder(tf.int64, shape=[])
dataset = tf.data.Dataset.range(max_value)
iterator = dataset.make_initializable_iterator()
next_element = iterator.get_next()
# 初始化迭代器，每次返回10个elements
sess.run(iterator.initializer, feed_dict={max_value: 10})
for i in range(10):
    value = sess.run(next_element)

# 初始化迭代器，每次返回100个elements
sess.run(iterator.initializer, feed_dict={max_value: 100})
for i in range(10):
    value = sess.run(next_element)
```

> ### 3.1 reinitializable

> ### 3.2 Consuming values from an iterator

```python
iterator = dataset.make_initializable_iterator()
next_element = iterator.get_next()
sess.run(iterator.initializer)
```

```python
training_dataset = tf.data.Dataset.range(100).map(
    lambda x: x + tf.random_uniform([], -10, 10, tf.int64))

validation_dataset = tf.data.Dataset.range(50)

iterator = tf.data.Iterator.from_structure(training_dataset.output_types,
                            training_dataset.output_shapes)

next_element = iterator.get_next()

training_init_op = iterator.make_initializer(training_dataset)
validation_init_op = iterator.make_initializer(validation_dataset)

for _ in range(20):
    sess.run(training_init_op)
    for _ in range(100):
        sess.run(next_element)
        
    sess.run(validation_init_op)
    for _ in range(50):
        sess.run(next_element)
```
> ### 3.3 feedable iterator


> ## 4 Reading input data

> ### 4.1 Consuming Numpy arrays

```python
with np.load('xxx.npy') as data:
    feature = data['feature']
    labels = data['labels']
    
assert feature.shape[0] == labels.shape[0]

features_placeholder = tf.placeholder(feature.dtype, feature.shape)
labels_placeholder = tf.placeholder(labels.dtype, labels.shape)

dataset = tf.data.Dataseeet.from_tensor_slices((feature_placeholder, labels_placeholder))

iterator = dataset.make_initializable_iterator()

sess.run(iterator.initializer, feed_dict={features_placeholder: features, labels_placeholder: labels})
```

> ### 4.2 Consuming TFRecord data

```python
filenames = tf.placeholder(tf.string, shape=[None])

dataset = tf.data.TFRecordDataset(filenames)
dataset = dataset.map().repeat().batch()
iterator = dataset.make_initializable_iterator()

# Initialize `iterator` with training data.
training_filenames = ["/var/data/file1.tfrecord", "/var/data/file2.tfrecord"]
sess.run(iterator.initializer, feed_dict={filenames: training_filenames})

# Initialize `iterator` with validation data.
validation_filenames = ["/var/data/validation1.tfrecord", ...]
sess.run(iterator.initializer, feed_dict={filenames: validation_filenames})
```

> ### 4.3 Consuming text data

```python
filenames = ["/var/data/file1.txt", "/var/data/file2.txt"]

dataset = tf.data.Dataset.from_tensor_slices(filenames)

# Use `Dataset.flat_map()` to transform each file as a separate nested dataset,
# and then concatenate their contents sequentially into a single "flat" dataset.
# * Skip the first line (header row).
# * Filter out lines beginning with "#" (comments).
dataset = dataset.flat_map(
    lambda filename: (
        tf.data.TextLineDataset(filename)
        .skip(1)
        .filter(lambda line: tf.not_equal(tf.substr(line, 0, 1), "#"))))
```

> ### 4.4 read data from csv

```python
# build dataset
ds = tf.data.TextLineDataset(train_path).skip(1)

# build a csv line parse
COLUMNS = ['SepalLength', 'SepalWidth',
       'PetalLength', 'PetalWidth',
       'label']
FIELD_DEFAULTS = [[0.0], [0.0], [0.0], [0.0], [0]]

def _parse_line(line):
    fields = tf.decode_csv(line, FIELD_DEFAULTS)
    features = dict(zip(COLUMNS, fields))
    
    label = features.pop('label')
    return features, label

# parse the lines
ds = ds.map(_parse_line)
```
> ## 5 Preprocessing data with `Dataset.map()`

> ### 5.1 Parsing `tf.Example` protocol buffer message

```python
def _parse_function(example_proto):
    features = {'image': tf.FixedLenFeature((), tf.string, default_value=''),
            'label': tf.FixedLenFeature((), tf.int32, default_value=0)}
    parsed_features = tf.parse_single_example(example_proto, features)
    return parsed_features['image'], parse_feature['label']

filenames = ["/var/data/file1.tfrecord", "/var/data/file2.tfrecord"]
dataset = tf.data.TFRecordDataset(filenames)
dataset = dataset.map(_parse_function)
```
> ### 5.2 Decoding image data and resizing it

```python
def _parse_function(filename, label):
    image_string = tf.read_file(filename)
    image_decoded = tf.image.decode_image(image_string)
    image_resized = tf.image.resize_images(image_decoded, [28, 28])
    return image_resize, label

filenames = tf.constant(["/var/data/image1.jpg", "/var/data/image2.jpg", ...])

labels = tf.constant([0, 37, ...])

dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))
dataset = dataset.map(_parse_function)
```
> ### 5.3 Applying arbitrary Python logic with `tf.py_func()`

```python
import cv2

# Use a custom OpenCV function to read the image, instead of the standard
# TensorFlow `tf.read_file()` operation.
def _read_py_function(filename, label):
  image_decoded = cv2.imread(filename.decode(), cv2.IMREAD_GRAYSCALE)
  return image_decoded, label

# Use standard TensorFlow operations to resize the image to a fixed shape.
def _resize_function(image_decoded, label):
  image_decoded.set_shape([None, None, None])
  image_resized = tf.image.resize_images(image_decoded, [28, 28])
  return image_resized, label

filenames = ["/var/data/image1.jpg", "/var/data/image2.jpg", ...]
labels = [0, 37, 29, 1, ...]

dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))
dataset = dataset.map(
    lambda filename, label: tuple(tf.py_func(
        _read_py_function, [filename, label], [tf.uint8, label.dtype])))
dataset = dataset.map(_resize_function)
```
> ## 6 Batching dataset elements

> ### 6.1 Simple batching

`Dataset.batch()`
> ### 6.2 Batching tensors with padding

```python
dataset = tf.data.Dataset.range(100)
dataset = dataset.map(lambda x: tf.fill([tf.cast(x, tf.int32)], x))
dataset = dataset.padded_batch(4, padded_shapes=[None])

iterator = dataset.make_one_shot_iterator()
next_element = iterator.get_next()

print(sess.run(next_element))  # ==> [[0, 0, 0], [1, 0, 0], [2, 2, 0], [3, 3, 3]]
print(sess.run(next_element))  # ==> [[4, 4, 4, 4, 0, 0, 0],
                               #      [5, 5, 5, 5, 5, 0, 0],
                               #      [6, 6, 6, 6, 6, 6, 0],
                               #      [7, 7, 7, 7, 7, 7, 7]]
```
> ## 7 Training workflows

> ### 7.1 Processing multiple epochs

可以使用dataset.repeat(),或者初始化多次。
```python
filenames = ["/var/data/file1.tfrecord", "/var/data/file2.tfrecord"]
dataset = tf.data.TFRecordDataset(filenames)
dataset = dataset.map(...)
dataset = dataset.repeat(10)
dataset = dataset.batch(32)
```
> ### 7.2 Randomly shuffling input data

`dataset = dataset.shuffle(buffer_size=10000)`

> ### 7.3 Using high-level APIs

```python
def dataset_input_fn():
  filenames = ["/var/data/file1.tfrecord", "/var/data/file2.tfrecord"]
  dataset = tf.data.TFRecordDataset(filenames)

  # Use `tf.parse_single_example()` to extract data from a `tf.Example`
  # protocol buffer, and perform any additional per-record preprocessing.
  def parser(record):
    keys_to_features = {
        "image_data": tf.FixedLenFeature((), tf.string, default_value=""),
        "date_time": tf.FixedLenFeature((), tf.int64, default_value=""),
        "label": tf.FixedLenFeature((), tf.int64,
                                    default_value=tf.zeros([], dtype=tf.int64)),
    }
    parsed = tf.parse_single_example(record, keys_to_features)

    # Perform additional preprocessing on the parsed data.
    image = tf.image.decode_jpeg(parsed["image_data"])
    image = tf.reshape(image, [299, 299, 1])
    label = tf.cast(parsed["label"], tf.int32)

    return {"image_data": image, "date_time": parsed["date_time"]}, label

  # Use `Dataset.map()` to build a pair of a feature dictionary and a label
  # tensor for each example.
  dataset = dataset.map(parser)
  dataset = dataset.shuffle(buffer_size=10000)
  dataset = dataset.batch(32)
  dataset = dataset.repeat(num_epochs)
  iterator = dataset.make_one_shot_iterator()

  # `features` is a dictionary in which each value is a batch of values for
  # that feature; `labels` is a batch of labels.
  features, labels = iterator.get_next()
  return features, labels
```

# argparse
---
```python
parser = argparse.ArgumentParser()

parser.add_argument('--name', type=int, default=10, help='')

FLAGS, _ = parser.parse_known_args()
```

# tf.feature_column
---
feature_column用于tf.feature_column.input_layer函数。
```python
# 把key转成numeric_column格式
tf.feature_column.numeric_column(key)

# 列表映射
tf.feature_column.categorical_column_with_vocabulary_list(key, vocabulary_list)

# 如果不确定列表的取值，可以使用.., 每个值会被映射成数字
tf.feature_column.categorical_column_with_hash_bucket(key, hash_bucket_size, dtype=tf.string)

# 区间划分
tf.feature_column.bucketized_column(source_column, boundaries)

# 多列组合成一个keys为包含多个列名的列表，
tf.feature_column.crossed_column(keys, hash_bucket_size, hash_key=None)
```
> eg:

```python
features = {
    'sales': [[5], [10], [8], [9]],
    'department': ['sports', 'sports', 'grad', 'grad']
}
depart = tf.feature_column.categorical_column_with_vocabulary_list(
    'department',['sports', 'grad'])
# indicator_column用于包装任何‘categorical_column_*’
depart = tf.feature_column.indicator_column(depart)
columns = [
    tf.feature_column.numeric_column('sales'),
    depart
]
inputs = tf.feature_column.input_layer(features, columns)
with tf.Session() as sess:
    tf.global_variables_initializer().run()
    # feature column需要tables_initializer初始化
    tf.tables_initializer().run()
    print(sess.run(inputs))
```
输出为
```
[[ 1.  0.  5.]
 [ 1.  0. 10.]
 [ 0.  1.  8.]
 [ 0.  1.  9.]]
```

# easy linear train model
---
```python
# 创建输入值和target值
x = tf.constant([[1], [2], [3], [4]], dtype=tf.float32)
y_ = tf.constant([[0], [-1], [-2], [-3]], dtype=tf.float32)

# create linear model
y_pred = tf.layers.dense(x, 1)

# MSE loss
loss = tf.losses.mean_squared_error(labels=y_, predictions=y_pred)

# optimizer
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(loss)

with tf.Session() as sess:
    tf.global_variables_initializer().run()
    for i in range(100):
        _, loss_value = sess.run((train_op, loss))
        print(loss_value)
    
    print(sess.run(y_pred))
```

# Estimator
---
Estimator主要有以下action：
- training
- evaluation
- prediction
- export for serving

# tf.estimator
---
```python
# DNNClassifier
classifier = tf.estimator.DNNClassifier(hidden_units, feature_columns, n_classes=2)

classifier.train(input_fn, hooks=None, steps=None)
classifier.predict(input_fn, predict_keys)
classifier.evaluate(input_fn)
```
---
**input_fn**是一个返回Dataset对象的函数，输出应该是有两个元素的元组。
- `features` - 一个python的字典类型
    - key为特征的名字
    - value：包含特征值的列表
- `label` - 一个数组，为每个样本的label。

```python
def input_evaluation_set():
    features = {'SepalLength': np.array([6.4, 5.0]),
            'SepalWidth':  np.array([2.8, 2.3]),
            'PetalLength': np.array([5.6, 3.3]),
            'PetalWidth':  np.array([2.2, 1.0])}
    labels = np.array([2, 1])
    return features, labels
```
用法2：
```python
run_config = tf.estimator.RunConfig(model_dir,...)

model_params = tf.contrib.training.HParams()

estimator = tf.estimator.Estimator(model_fn, config=run_config, params=model_params)

train_spec = tf.estimator.TrainSpec()

eval_spec = tf.estimator.EvalSpec()
```

## Define the model
---
### Define the input layer
```python
# write an input function
def train_input_fn(features, labels, batch_size):
    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))
    dataset = dataset.shuffle(1000).repeat().batch(batch_size)
    return dataset.make_one_shot_iterator().get_next()

# Create feature columns
my_feature_columns = []
for key in train_x.keys():
    my_feature_columns.append(tf.feature_column.numeric_column(key=key))

# Write a model fnction
def my_model_fn(features, labels, mode, params):
    # Define the input layer
    net = tf.feature_column.input_layer(features, params['feature_columns'])
    
    # Define hidden layer
    for units in params['hidden_units']:
        net = tf.layers.dense(net, units=units, activation=tf.nn.relu)
    
    # Define Output layer
    logits = tf.layers.dense(net, params['n_classes'], activation=None)
    return logits

# Define classifier
classifier = tf.estimator.Estimator(model_fn=my_model_fn, 
                                    params={
                                        'feature_column': my_feature_columns,
                                        'hidden_units': [10, 10],
                                        'n_classes': 3
                                    })

# Define train
classifier.train(input_fn=lambda: train_input_fn(FILE_TRAIN, True, 500))# MOdeKeys.TRAIN

# Define predict operation
predicted_classes = tf.argmax(logits, 1)
if mode == tf.estimator.ModeKeys.PREDICT:
    predictions={
        'class_ids': predicted_classes[:, tf.newaxis],
        'probabilities': tf.nn.softmax(logits),
        'logits': logits
    }
    return tf.estimator.EstimatorSpec(mode, predictions=predictions)

# loss function
loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)

# Evaluate operation
accuracy = tf.metrics.accuracy(labels=labels, predictions=predicted_classes, name='acc_op')

metrics = {'accuracy': accuracy}
tf.summary.scalar('accuracy', accuracy[1])

if mode == tf.estimator.ModeKeys.EVAL:
    return tf.estimator.EstimatorSpec(mode, loss=loss, eval_metric_ops=metrics)

# Train operation
if mode == tf.estimator.ModeKeys.TRAIN:
    optimizer = tf.train.AdagradOptimizer(learning_rate=0.1)
    train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step())
    return tf.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op)
```


# tf.logging
---
```python
# 打开日志功能
tf.logging.set_verbosity(tf.logging.INFO)
```

# checkpoints
---
用于estimator保存模型,配置保存间隔，和最大保存文件数。
```python
my_checkpointing_config = tf.estimator.RunConfig(
    save_checkpoint_secs=20*60,
    keep_checkpoint_max=10
)

classifier = tf.estimator.DNNClassifier(
    feature_columns=my_feature_columns,
    hidden_units=[10, 10],
    n_classes=3,
    model_dir='xxx/xxx'
    config=my_checkpointing_config
)
```

# tf.layers
---
一个可训练的模型必须实现对graph中的参数值进行修改(以实现每次训练模型后相同的输入不同的输出)，对graph中的trainable参数进行更新。

该模块主要用于创建神经网络，提供了创建全连接层，卷积层，激活函数，dropout regularization。案例为**mnist**

> CNNS(conv(relu)-pool-conv(relu)-pool-...-conv(relu)-dense-dense-output)

- **Convolutional layers**,in the last will apply RELU activation function to the output.
- **Pooling**, reduce the dimensionality.
- **Dense**, which perform classification on the features.

> build a model to classify the images in the mnist dataset.

1. conv1, `weights = [-1, 5, 5, 32]`, with ReLU
2. pool1, `kernel_size=[1, 2, 2, 1]`, `stride=[1, 2, 2, 1]`
3. conv2, `weights=[-1, 5, 5, 64]`, with ReLU
4. pool2, `kernel_size=[1, 2, 2, 1]`, `stride=[1, 2, 2, 1]`
5. dense1, `[-1, 1024]`, `dropout(0.4)`,
6. dense2, `[-1, 10]`

In [1]:
import tensorflow as tf

In [11]:
tf.estimator.DNNClassifier?

In [12]:
import numpy as np

In [5]:
tf.layers.Dense?