<font color='darkred'> Unless otherwise noted, **this notebook will not be reviewed or autograded.**</font> You are welcome to use it for scratchwork, but **only the files listed in the exercises will be checked.**

---

# Background

You've just been hired as a Pokémon researcher, and your assignment is to maintain a dataset of sightings from trainers worldwide. You expect future inputs may be duplicate entries, and there may be missing values or inconsistent formats. These sorts of entries can cause the system to crash, so you need to build a validation process for newly recieved data.

The current (existing) entry data includes a collection of **first** sightings from various trainers.

In [None]:
import pandas as pd

In [2]:
# load synthetic data for this exercise
df = pd.read_csv('https://raw.githubusercontent.com/leontoddjohnson/datasets/main/data/pokemon/pokemon_entries.csv')

In [4]:
# sample from current entries
df.sample(5, random_state=33)

Unnamed: 0,pokemon_id,trainer_name,pokemon_type,seen_time
56,35,Trainer_34,Grass,2023-07-17 00:00:00
90,41,Trainer_8,Normal,2023-12-14 16:00:00
95,97,Trainer_77,Grass,2023-12-26 00:00:00
82,130,Trainer_60,Normal,2023-11-10 20:00:00
60,90,Trainer_23,Normal,2023-08-23 04:00:00


# Exercises

Over the next few exercises, we will implement a system that validates new Pokémon entries, handling errors and ensuring necessary information is present. A new entry should be in the form of a dictionary:

```python
entry_example = {
    'pokemon_id': 78,
    'trainer_name': 'Trainer_48',
    'pokemon_type': 'normal',
    'seen_time': '2023-12-28 20:00:00'
}
```

All but the last exercise below will be done by editing the *apputil\.py* file. The last exercise involves creating a unit testing suite in a separate python file. In all cases, make sure to **use `assert`, `try`, and `except` appropriately.**

## Exercise 1

If an entry's `pokemon_id`-`trainer_name`-`seen_time` combination already exists in the existing data, then it is a duplicate.

Create a function called `is_duplicate(entry)` that returns `True` if the `entry` is a duplicate, and `False` otherwise.

*Hint: Notice that the `trainer_name` is case insensitive. So, `Trainer_48` is the same as `trainer_48`.*



## Exercise 2

The `pokemon_type` must be one of the following: 'Water', 'Fire', 'Grass', 'Earth', 'Normal', or it can be empty (notice, these must be capitalized).

Write a function `clean_type(entry)` that does the following:

1. If the type is invalid, throw a `ValueError` with a descriptive message.
2. If the type is "empty"-ish (e.g., None, "", " ", etc.), return `None`.
3. If the type is valid, make sure it's capitalized, and return the correct string.

So for example, we could have

```python
clean_type(entry_example)
>> "Normal"
```

## Exercise 3

Write a function `clean_id(entry)` that does the following:

1. If the `pokemon_id` is negative or not an integer, throw a `ValueError` with a descriptive message.
2. If the `pokemon_id` is a string containing an integer (e.g., "123"), coerce it to an integer, and return that integer.

## Exercise 4

Recall that the current entry data includes **first** sightings from trainers. So, a new sighting for a trainer cannot be before a date already in the dataset.

Create a function `clean_time(entry)` that does the following:

1. If the `seen_time` cannot be parsed by `pd.Timestamp`, return a `ValueError` with a descriptive message.
2. If the `seen_time` occurs *before* the earliest one in the dataset, return a `ValueError` with a descriptive message.
3. If the `seen_time` is valid, return the `pd.Timestamp` of that time.

For example: Trainer_35 cannot have seen a new pokemon before 2023-01-06 at 8am:

In [9]:
df[df['trainer_name'].str.lower() == "trainer_39"]

Unnamed: 0,pokemon_id,trainer_name,pokemon_type,seen_time
0,28,Trainer_39,Water,2023-01-06 08:00:00


## Exercise 5

Create a function **validate_entry** that combines the functions above. It should accept a single Pokemon entry in the form of a dictionary, and return a clean entry or throw an error.


## Exercise 6

Use the `unittest` package to 