# Tensorflow Input Pipeline Basics

In [None]:
from __future__ import absolute_import, division, print_function, unicode_literals

import pathlib
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# conda create -n tf tensorflow
# conda activate tf
import tensorflow as tf

from termcolor import cprint

np.set_printoptions(precision=4)

cprint("Imported!", 'green')

## Basic Mechanics

To create an input pipeline, we need a data source. For example, to construct a Dataset from data in memory, we can use `tf.data.Dataset.from_tensors()` or `tf.data.Dataset.from_tensor_slices()`. Alternatively, if our input data is stored in a file in the recommended TFRecord format, we can use `tf.data.TFRecordDataset()`.

In [None]:
dataset = tf.data.Dataset.from_tensor_slices([8, 3, 0, 8, 2, 1])
dataset

The `dataset` object is a Python iterable object, so we can consume it in a for loop as seen below

In [None]:
for elem in dataset:
    cprint(elem, 'cyan')

it = iter(dataset)
cprint(next(it).numpy(), 'blue')

### Dataset Structure
A dataset contains elements that each have the same (nested) structure and the individual components of the structure can be of any type representable by `tf.TypeSpec`, including `Tensor`, `SparseTensor`, `RaggedTensor`, `TensorArray`, or `Dataset`.

The `Dataset.element_spec` property allows you to inspect the type of each element component. The property returns a nested structure of `tf.TypeSpec` objects, matching the structure of the element, which may be a single component, a tuple of components, or a nested tuple of components. For example:

In [None]:
dataset_1 = tf.data.Dataset.from_tensor_slices(tf.random.uniform([4, 10]))

dataset_1.element_spec


In [None]:
dataset_2 = tf.data.Dataset.from_tensor_slices(
    (tf.random.uniform([4]),
    tf.random.uniform([4, 100], maxval=100, dtype=tf.int32)))

dataset_2.element_spec

In [None]:
dataset_3 = tf.data.Dataset.zip((dataset_1, dataset_2))

dataset_3.element_spec

In [None]:
# Dataset containing a sparse tensor.
dataset_4 = tf.data.Dataset.from_tensors(tf.SparseTensor(indices=[[0, 0], [1, 2]], values=[1, 2], dense_shape=[3, 4]))

dataset_4.element_spec


In [None]:
# Use value_type to see the type of value represented by the element spec
dataset_4.element_spec.value_type

The Dataset transformations support datasets of any structure. When using the `Dataset.map()`, and `Dataset.filter()` transformations, which apply a function to each element, the element structure determines the arguments of the function:

In [None]:
dataset_1 = tf.data.Dataset.from_tensor_slices(
    tf.random.uniform([4, 10], minval=1, maxval=10, dtype=tf.int32))

print(dataset_1)

for data in dataset_1:
    print(data.numpy())

## Reading Input Data

More information for specific situations can be found at the following links.

#### [Consuming NumPy arrays](https://www.tensorflow.org/guide/data#consuming_numpy_arrays)

#### [Consuming Python generators](https://www.tensorflow.org/guide/data#consuming_python_generators)

#### [Consuming TFRecord data](https://www.tensorflow.org/guide/data#consuming_tfrecord_data)

#### [Consuming text data](https://www.tensorflow.org/guide/data#consuming_text_data)

#### [Consuming CSV data](https://www.tensorflow.org/guide/data#consuming_csv_data)

#### [Consuming sets of files](https://www.tensorflow.org/guide/data#consuming_sets_of_files)

## Batching dataset elements

### Simple Batching

### Batching Tensors with padding

## Training workflows

### Processing multiple epochs

### Randomly shuffling input data

## Preprocessing data

### Decoding image data and resizing it

### Applying arbitrary Python logic

### Parsing tf.Example protocol buffer messages

### Time series windowing

### Resampling

## Iterator Checkpointing

## Using high-level APIs

