# How it Works

WebDataset is powerful and it may look complex from the outside, but its structure is quite simple: most of
the code consists of functions mapping an input iterator to an output iterator:

```Python
def add_noise(source, noise=0.01):
    for inputs, targets in source:
        inputs = inputs + noise * torch.randn_like(inputs)
        yield inputs, targets
```

To write new processing stages, a function like this is all you ever have to write. 
The rest is really bookkeeping: we need to be able
to repeatedly invoke functions like this for every epoch, and we need to chain them together.

To turn a function like that into an `IterableDataset`, and chain it with an existing dataset, you can use the `webdataset.Processor` class:

```Python
noisy_dataset = webdataset.Processor(add_noise, noise=0.02)(dataset)
```

The `webdataset.WebDataset` class is just a wrapper for `Processor` with a default initial processing pipeline and some convenience methods.  Full expanded, the above pipeline can be written as:

```Python
dataset = wds.ShardList(url)
dataset = wds.Processor(wds.url_opener)(dataset)
dataset = wds.Processor(wds.tar_file_expander)(dataset)
dataset = wds.Processor(wds.group_by_keys)(dataset)
dataset = wds.Processor(wds.shuffle, 100)(dataset)
dataset = wds.Processor(wds.decode, "torchrgb")(dataset)
noisy_dataset = wds.Processor(wds.augment_sample, noise=0.02)(dataset)
```

`wds.Processor` is just an `IterableDataset` instance; you can use it wherever you might use an `IterableDataset` and mix the two styles freely.

For example, you can reuse WebDataset processors with existing `IterableDataset` implementations, for example if you want shuffling, caching, or batching with them. Let's say you have a class `MySqlIterableDataset` that iterates over samples from an SQL database and you want to shuffle and batch the results. You can write:

```Python
dataset = MySqlIterableDataset(database_connection)
dataset = wds.Processor(wds.shuffle, 100)(dataset)
dataset = wds.Processor(wds.batch, 16)(dataset)
noisy_dataset = wds.Processor(wds.augment_sample, noise=0.02)(dataset)
```