In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import os
import re
import time
import json

### Problem Overview

As Star Wars data nerds, we have a few questions about the Universe. We don’t really like wikis, but we do
love APIs, code, charts, and graphs!

We’ve seen the Star Wars API (https://swapi.co), and its data seem useful. However, it leaves us with a few
specific questions that we need answered in visual form:
    1. It seems like there is quite a variety of heights in the Star Wars Universe. Show us the distribution 
    of heights across gender, homeworld, and species.
    2. The Original Trilogy and the Prequel Trilogy both featured men in leading roles and women in supporting 
    roles, but the Sequel Trilogy features a leading woman. What is the distribution of genders across the films?
    3. Back to our difficult-to-explain interest in heights: can you find and visualize a linear regression that
    clearly explains the height of an individual? Using a programming language and/or framework of your choice, 
    write a program that gives us insight into these questions. We like visualizations (e.g. charts and graphs), 
    not tables or lists of numbers. If you have any other ideas, questions or insights from the data, we’d love 
    to see those as well!

## Navigating the API

There are a few different ways to access the API. The resource types are:
 - Film, People, Planets, Species, Starships, and Vehicles

Film will give us information about each film (title, opening crawl, director, etc.), people will give us information about a person within the Star Wars universe (name, height, eye color, etc.), and so on. Here it makes sense to focus on the "people" resource - since we're being asked very "people-centric" questions. 

In [2]:
url = 'http://swapi.co/api/people/'
req = requests.get(url)
a = json.loads(req.content)

In [3]:
a.keys()

dict_keys(['count', 'next', 'previous', 'results'])

In [4]:
print('Next: {}'.format(a['next']), 
      'Previous: {}'.format(a['previous']), 
      'Count: {}'.format(a['count']), sep='\n')

Next: https://swapi.co/api/people/?page=2
Previous: None
Count: 87


In [5]:
a['results']

[{'name': 'Luke Skywalker',
  'height': '172',
  'mass': '77',
  'hair_color': 'blond',
  'skin_color': 'fair',
  'eye_color': 'blue',
  'birth_year': '19BBY',
  'gender': 'male',
  'homeworld': 'https://swapi.co/api/planets/1/',
  'films': ['https://swapi.co/api/films/2/',
   'https://swapi.co/api/films/6/',
   'https://swapi.co/api/films/3/',
   'https://swapi.co/api/films/1/',
   'https://swapi.co/api/films/7/'],
  'species': ['https://swapi.co/api/species/1/'],
  'vehicles': ['https://swapi.co/api/vehicles/14/',
   'https://swapi.co/api/vehicles/30/'],
  'starships': ['https://swapi.co/api/starships/12/',
   'https://swapi.co/api/starships/22/'],
  'created': '2014-12-09T13:50:51.644000Z',
  'edited': '2014-12-20T21:17:56.891000Z',
  'url': 'https://swapi.co/api/people/1/'},
 {'name': 'C-3PO',
  'height': '167',
  'mass': '75',
  'hair_color': 'n/a',
  'skin_color': 'gold',
  'eye_color': 'yellow',
  'birth_year': '112BBY',
  'gender': 'n/a',
  'homeworld': 'https://swapi.co/api/pl

This is a very nice API. By making a GET request to the root "people" url, we are given all the information we need to request all of the "people" resources.

# People Attributes
- name: string -- The name of this person.
- birth_year: string -- The birth year of the person, using the in-universe standard of BBY or ABY - Before the Battle of Yavin or After the Battle of Yavin. The Battle of Yavin is a battle that occurs at the end of Star Wars episode IV: A New Hope.
- eye_color: string -- The eye color of this person. Will be "unknown" if not known or "n/a" if the person does not have an eye.
- gender: string -- The gender of this person. Either "Male", "Female" or "unknown", "n/a" if the person does not have a gender.
- hair_color: string -- The hair color of this person. Will be "unknown" if not known or "n/a" if the person does not have hair.
- height: string -- The height of the person in centimeters.
- mass: string -- The mass of the person in kilograms.
- skin_color: string -- The skin color of this person.
- homeworld: string -- The URL of a planet resource, a planet that this person was born on or inhabits.
- films: array -- An array of film resource URLs that this person has been in.
- species: array -- An array of species resource URLs that this person belongs to.
- starships: array -- An array of starship resource URLs that this person has piloted.
- vehicles: array -- An array of vehicle resource URLs that this person has piloted.
- url: string -- the hypermedia URL of this resource.
- created: string -- the ISO 8601 date format of the time that this resource was created.
- edited: string -- the ISO 8601 date format of the time that this resource was edited.

There are 87 pages worth of people, so we can easily load all of this data into memory. It would be nice to put these into a Pandas DataFrame, but the data will need to be wrangled a little bit first. We can drop "url", "created", and "edited". The "films", "species", "starships", and "vehicles" attributes are returned arrays, so we'll need to figure out some way to make these into valid columns. For "films" for example, we might have a column for each movie and a value of True or False for each row, indicating if the person was present or not. 

In [14]:
column_list = ['name','birth_year','eye_color','gender','hair_color',
                'height','mass','skin_color','homeworld','films','species',
                'starships','vehicles']

In [23]:
df = pd.DataFrame(columns = column_list)

In [20]:
for i in column_list:
    print(a['results'][0][i])

Luke Skywalker
19BBY
blue
male
blond
172
77
fair
https://swapi.co/api/planets/1/
['https://swapi.co/api/films/2/', 'https://swapi.co/api/films/6/', 'https://swapi.co/api/films/3/', 'https://swapi.co/api/films/1/', 'https://swapi.co/api/films/7/']
['https://swapi.co/api/species/1/']
['https://swapi.co/api/starships/12/', 'https://swapi.co/api/starships/22/']
['https://swapi.co/api/vehicles/14/', 'https://swapi.co/api/vehicles/30/']


In [33]:
column_list[:9]

['name',
 'birth_year',
 'eye_color',
 'gender',
 'hair_color',
 'height',
 'mass',
 'skin_color',
 'homeworld']

In [92]:
from src.data_analysis import web_utilities, df_utilities

In [93]:
df = df_utilities.get_initial_df(column_list[:9])

In [94]:
results = a['results']
non_row_keys = ['films','species','starships','vehicles']
for i in results:
    new_cols = list(df.columns[:9])
    row_dict = dict()
    for c in df.columns[:9]:
        row_dict[c] = i[c]
    for c in non_row_keys:
        for j in i[c]:
            new_cols.append(get_new_col_name(j))
            row_dict[get_new_col_name(j)] = True
    df = df.append([row_dict], sort=False, ignore_index=True)

In [88]:
df

Unnamed: 0,name,birth_year,eye_color,gender,hair_color,height,mass,skin_color,homeworld,films_2,...,films_5,films_4,species_2,starships_13,starships_48,starships_59,starships_64,starships_65,starships_74,vehicles_38
0,Luke Skywalker,19BBY,blue,male,blond,172,77,fair,https://swapi.co/api/planets/1/,True,...,,,,,,,,,,
1,C-3PO,112BBY,yellow,,,167,75,gold,https://swapi.co/api/planets/1/,True,...,True,True,True,,,,,,,
2,R2-D2,33BBY,red,,,96,32,"white, blue",https://swapi.co/api/planets/8/,True,...,True,True,True,,,,,,,
3,Darth Vader,41.9BBY,yellow,male,none,202,136,white,https://swapi.co/api/planets/1/,True,...,,,,True,,,,,,
4,Leia Organa,19BBY,brown,female,brown,150,49,light,https://swapi.co/api/planets/2/,True,...,,,,,,,,,,
5,Owen Lars,52BBY,blue,male,"brown, grey",178,120,light,https://swapi.co/api/planets/1/,,...,True,,,,,,,,,
6,Beru Whitesun lars,47BBY,blue,female,brown,165,75,light,https://swapi.co/api/planets/1/,,...,True,,,,,,,,,
7,R5-D4,unknown,red,,,97,32,"white, red",https://swapi.co/api/planets/1/,,...,,,True,,,,,,,
8,Biggs Darklighter,24BBY,brown,male,black,183,84,light,https://swapi.co/api/planets/1/,,...,,,,,,,,,,
9,Obi-Wan Kenobi,57BBY,blue-gray,male,"auburn, white",182,77,fair,https://swapi.co/api/planets/20/,True,...,True,True,,,True,True,True,True,True,True


In [89]:
df = df_utilities.get_initial_df(column_list[:9])

In [91]:
df = df_utilities.add_to_df(df, results)

ValueError: If using all scalar values, you must pass an index