# Exercise on datapreparation
Working with neural networks involves managing vast amounts of data, and when you're dealing with multiple datasets, things can quickly spiral into chaos.<br> In order to keep the entropy of the system low, this notebook introduces you to a fundamental tool for data management: dictionaries. <br>Dictionaries provide an elegant solution to maintain structure and clarity in your data, making it easier to work with various sets of input and labeled output data.


But that's not all. We'll also delve into the power of automation, demonstrating how dictionaries can be harnessed to streamline tasks like dataset creation.<br> This notebook will provide hands-on exercises on dictionaries, functions and for loops, equipping you with essential skills to tackle your upcoming experiment on neural networks.

# Tasks
## Library import
In this Exercise we are working with PyTorch (`torch`). A go-to library for machine learning due to its robust Machine Learning (ML) capabilities.<br> Its flexibility and powerful tools simplify the implementation of neural networks, making it an ideal choice for training and deploying models efficiently.<br> If you're familiar with NumPy, you'll find working with PyTorch to be similar, which eases the transition into deep learning.

So as a first step, import torch.



## Labeled data
Our objective is to produce a labeled dataset, where each input value is labeled with an associated target value. Here, the input values represent a time series, while the target values are related to the distance covered.  <br><br>
To do so, we create two tensors: `data_input` and `data_target`.<br> These tensors will facilitate a straightforward task where each input value  should be multiplied by 2 to obtain the target value.
| data_inputs   | 0  | 1  | 2  | 3  | 4  | 5  | 6  | 7  | 8  | 9  | 10 |
| --- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --- |
| data_target  | 0  | 2  | 4  | 6  | 8  | 10 | 12 | 14 | 16 | 18 | 20 |
### Task 1
1. **Create the `data_input` tensor:**
   - Use the `torch.arange` command to generate a tensor with all integers between 0 and 10. This will be our input data.
   <br><br>
2. **Create the `data_target` tensor:**
   - Generate another tensor, `data_target`,  by multiplying the `data_input` tensor by two.


In [None]:
# data_input =

In [None]:
# data_target =

**Checkpoint:**

Run the following cell. Your result should match the expected output:

```python
data_input tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])
data_target tensor([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20])


In [1]:
# print("data_input", data_input)
# print("data_target", data_target)

So far so good. But also quiet time consuming, if we want to create many different sets of labeled data. A good way to work around this is by using a function, that takes a time domain as input and creates a set of labeled data. Let's call this function `create_labeled_data` and implement it: 
### Task 2
1. **Implement a function `create_labeled_data`:**
   - The function should generate a `data_input` and `data_target` tensor and store them in a dictionary.
   - A step by step explanation is given in the docstring of the function.
   <br><br>
2. **Create a dictionary `labeled_data` by executing the `create_labeled_data` function:**
   - Choose as Boundary conditions `t_min = 0` and `t_max = 10`.
   - Print your dictionary.

Hint:  
You have never come across a Python **dictionary** before. Discover valuable information about them [at this source](https://www.w3schools.com/python/python_dictionaries.asp)

In [None]:
def create_labeled_data(t_min, t_max):
    """
    Creates labeled data and stores it in a dictionary.

    Step 1: Generate Data
            - Create 'data_input' and 'data_target' tensors as in Task one.
            - As time boundaries use t_min and t_max.

    Step 2: Return Dictionary
            - Name the keys "input" and "target".
            - Link the keys to the appropriated variables created in Step 1.

    Parameters
    ----------
    t_min : int
        The minimum value for the input data.
    t_max : int
        The maximum value for the input data.

    Returns
    -------
    dict
        A dictionary containing the input and target data.
        The dictionary has two entries:
        - "input": a tensor containing the input data
        - "target": a tensor containing the target data
    """
    # data_input = 
    # data_target = 

    pass #replace the pass with retrun and the dictionary

In [None]:
# Create a dictionary called "labels_dict" and print it.

# labels_dict = 
# print("labels_dict: ", labels_dict)

# Automation
We are now able to create a dictionary with labeled data using the `create_labeled_data` function. But what if we want to create several of them with different time boundaries? The hard way is to create each of them by hand. The smart way is to employ a for loop that iterates over a list of time boundaries. This way, we can maintain multiple dictionaries, each containing labeled data associated with distinct time intervals, within a single overarching dictionary. In other words, we generate a [nested dictionary](https://www.w3schools.com/python/python_dictionaries_nested.asp).  
Here is how:

### Task 3
1. **Generate a `data_dict` that contains all `labels_dicts`:**
   - Create a List called `looplist` with 3 tuples of your choice associated with (`t_min`, `t_max`).
   - Set up an empty dictionary called `data_dict`.
   - Build a for loop that iterates over the tuples (`t_min`, `t_max`) in the looplist. Use the enumerate Method to generate a counter.
   - **Inside the loop,** 
      - Create a dictionary `labels_dict`  by calling the `create_labeled_data` function.
      - Define a variable called key, that stores dynamically generated keys. Use a [formatted string](https://builtin.com/data-science/python-f-string)  to create unique keys for each iteration(e.g., `"data_0"`, `"data_1"`, `"data_2"`).
      - Store the generated data and its associated key in the `data_dict`. 
   - Outside the loop, print or display the `data_dict` to view the stored data.


In [3]:
# Define a list of tuples, where each tuple represents time range limits
# looplist = 

# Initialize an empty dictionary to store labeled data
# data_dict = 

# Loop through the list of time range tuples
# for i, (t_min, t_max) in enumerate(looplist):
    
    # Call a function to create a train dictionary for the current time range
    # labels_dict = 
    
    # Create a unique key using a formatted string
    # key = 
    
    # Store the labeled data in the dictionary with the unique key
    # data_dict[key] = 

# Print the dictionary containing labeled data

**Good Job, you are now well prepared for the experiment.**