<font color='darkred'> Unless otherwise noted, **this notebook will not be reviewed or autograded.**</font> You are welcome to use it for scratchwork, but **only the files listed in the exercises will be checked.**

---

# Background

You've just been hired as a Pokémon researcher, and your assignment is to maintain a dataset of sightings from trainers worldwide. You expect future inputs may be duplicate entries, and there may be missing values or inconsistent formats. These sorts of entries can cause the system to crash, so you need to build a validation process for newly recieved data.

The current (existing) entry data includes a collection of **first** sightings from various trainers.

In [None]:
import pandas as pd
import numpy as np

In [None]:
# load synthetic data for this exercise
df = pd.read_csv('https://raw.githubusercontent.com/leontoddjohnson/datasets/main/data/pokemon/pokemon_entries.csv')

In [None]:
# sample from current entries
df.sample(5)

Unnamed: 0,pokemon_id,trainer_name,pokemon_type,seen_time
44,46,Trainer_81,Water,2023-05-17 00:00:00
93,75,Trainer_19,Grass,2023-12-20 12:00:00
20,63,Trainer_82,Normal,2023-03-01 16:00:00
10,79,Trainer_10,,2023-02-01 20:00:00
34,144,Trainer_50,Water,2023-04-15 20:00:00


**Input Requirements**

- The `pokemon_type` must be one of the following: 'Water', 'Fire', 'Grass', 'Earth', 'Normal', or it can be unknown.
    - These must be capitalized.
- The `pokemon_id` cannot be negative, and it must be an integer.
- Recall that the current entry data includes **first** sightings from trainers. So, a new sighting for a trainer cannot be before a date already in the dataset.
    - For example: Trainer_35 cannot have seen a new pokemon before 2023-12-05 at noon.

# Exercises

Over the next few exercises, we will implement a system that validates new Pokémon entries, handling errors and ensuring necessary information is present. The following is a valid entry:

```python
{
    'pokemon_id': 78,
    'trainer_name': 'Trainer_48',
    'pokemon_type': 'Normal',
    'seen_time': '2023-12-28 20:00:00'
}
```

All but the last exercise below will be done by editing the *apputil\.py* file.

## Exercise 1

Create a function that denies duplicates. That is, if a `pokemon_id`-`trainer_name`-`seen_time` combination already exists, then this is a duplicate. Write your function 

create a function **validate_entry** that accepts a single Pokemon entry in the form of a dictionary as input. If that input is valid, 




The function must accomplish the following:

* If the new entry is a duplicate (see above), print a message stating so.
* If the `pokemon_type` is invalid, return an error.
* If the `pokemon_type` is missing or an empty string (""), set the `pokemon_type` in the data to be missing (using `None` or `np.nan`).
* If the `pokemon_type` is valid, but not in the right format (i.e., capitalized), then correct the format (e.g., consider using `str.capitalize()`).
* If the `pokemon_id` is negative or not an integer, return an error.
* If the `pokemon_id` is a string containing an integer (e.g., "123"), coerce it to be an integer.
* If the `seen_time` is invalid, return an error.
* *Note: The `trainer_name` is case insensitive. That is, "Trainer_1" is the same as "trainer_1", and either are fine.*

Your function must **use `assert`, `try`, and `except` appropriately**. Also, any dictionary output by the function should seamlessly append to the "current" dataset.

In [None]:
def validate_entry(new_entry):
    # your code here

    print('Function reports errors, and fixes problems ...')

    valid_entry = new_entry  # this is wrong, of course

    return valid_entry

In [None]:
new_entry = df.sample(1).iloc[0]

validate_entry(new_entry.to_dict())

Function reports errors, and fixes problems ...


{'pokemon_id': 34,
 'trainer_name': 'Trainer_48',
 'pokemon_type': 'Normal',
 'seen_time': Timestamp('2023-12-28 20:00:00')}

In [None]:
df[df['trainer_name'] == 'Trainer_6']

Unnamed: 0,pokemon_id,trainer_name,pokemon_type,seen_time
7,136,Trainer_6,Fire,2023-01-31 00:00:00


## Test Cases

A few examples:

```
{'pokemon_id': 12,
 'trainer_name': 'Trainer_8',
 'pokemon_type': 'Normal',
 'seen_time': '2022-01-04'}

>>> Error! ... the seen_time has no "time" associated with it
```

```
{'pokemon_id': 12,
 'trainer_name': 'Trainer_6',
 'pokemon_type': 'Fire',
 'seen_time': '2023-01-24 12:00:00'}

>>> Error! ... This is before the first sighting for Trainer_6
```

```
{'pokemon_id': 12,
 'trainer_name': 'Trainer_6',
 'pokemon_type': 'Fire',
 'seen_time': '2024-01-01 16:00:00'}

>>> Valid entry, appended to dataset.
```