# Coding Temple's Data Analytics Program
---
## Python for DA: Weekend Project

For this weekend project, you will be connecting to the [Disney API](https://disneyapi.dev/) to create an ETL pipeline. Your project should contain:

- etl_pipeline.py
    - Loads in data from the API object for all characters
    - Stores required fields from the API to a DataFrame
        - name
        - all movies/shows the character appeared in
        - any allies
        - any enemies
        - any park attractions
    - Cleans the data
    - Performs any transformations/feature engineering you wish to complete
    - Stores the data in an ElephantSQL server
    - Stores the data in a .csv file

- notebook.ipynb
    - Contains all cells you used to test your code before loading it into the pipeline
    - Loads in the data from your .csv file
    - Conduct EDA through data
    - Conduct an analysis on your dataset!

In [37]:
import requests, json
import pandas as pd
import matplotlib.pyplot as plt
import psycopg2 
import sqlalchemy as sa

api_url = 'https://api.disneyapi.dev/character/'
response = requests.get(api_url)
data = response.json()['data']

character_data = []
for character in data:
    character_info = {
        'name': character['name'],
        'films': ', '.join(character['films']),
        'tvshows':', '.join( character['tvShows']),
        'allies':', '.join(character['allies']),
        'enemies':', '.join(character['enemies']),
        'park_attractions':', '.join(character['parkAttractions'])
    }
    character_data.append(character_info)

df = pd.DataFrame(character_data)
engine = sa.create_engine(r'postgres://qphdwajm:Qj_RAtbX0GzcoxlPIHqEoOHnu96mcdNG@batyr.db.elephantsql.com/qphdwajm')
df.to_sql('DisneyApi',con= engine, index =False, if_exists = 'replace')
engine.dispose()

In [38]:
csv_filename = 'disney_char.csv'
df.to_csv(csv_filename, index = False)

In [45]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

print("Basic Statistics:")
print(df.describe())

print("\nFirst Few Rows:")
print(df.head())

print("\nMissing Values:")
print(df.isnull().sum())



Basic Statistics:
            name films tvshows allies enemies park_attractions
count         50    50      50     50      50               50
unique        48    17      30      1       1                3
top     Achilles                                              
freq           2    34      15     50      50               48

First Few Rows:
                  name films                             tvshows allies  \
0             'Olu Mel                                                    
1             .GIFfany                             Gravity Falls          
2                  627        Lilo & Stitch: The Series, Stitch!          
3                9-Eye                                                    
4  90's Adventure Bear                         Pickle and Peanut          

  enemies park_attractions  
0                           
1                           
2                           
3           The Timekeeper  
4                           

Missing Values:
name    

Based on the data above this dataset would not be a good representation of data there are a lot of empty values in many of the columns there are very few places in which one can make any insigts the posibilits are maybe the ammount of movies each character is in vs the amount of tv shows but thats mostly it the rest of the data is filled with empty lists. Another possible use for the data is to pull images for the characters as well to develop an application that can use those for other purposes than data analysis.