# Writing Efficient Python Code
Run the hidden code cell below to import the data used in this course.

In [1]:
# Importing pandas
import pandas as pd
import numpy as np

# Reading in the data
baseball = pd.read_csv("baseball.csv")

## Writing Efficient Python Code

**efficient**
* fast runtime
* small memory footprint

```python

#non-pythonic 

doubled_numbers = []

for i in range(len(numbers)):
    doubled_numbers.append(numbersi[i] * 2)

# pythonic - shorter code and faster

doubled_numbers = [x*2 for x in numbers]

```

In [7]:
names = ['Jerry', 'Kramer', 'Elaine', 'George', 'Newman']

# Print the list created using the Non-Pythonic approach
i = 0
new_list= []
while i < len(names):
    if len(names[i]) >= 6:
        new_list.append(names[i])
    i += 1
print(new_list)

# Print the list created by looping over the contents of names
better_list = []
for name in names:
    if len(name) >= 6:
        better_list.append(name)
print(better_list)

# Print the list created by using list comprehension
best_list = [name for name in names if len(name) >= 6]
print(best_list)


['Kramer', 'Elaine', 'George', 'Newman']
['Kramer', 'Elaine', 'George', 'Newman']
['Kramer', 'Elaine', 'George', 'Newman']


In [8]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


### Python standard library

* Built-in types: 
  * list, tuple, set, dict, and others
* Built-in functions: 
  * print(), len(), range(), enumerate(), map(), zip(), and others
* Built-in modules
  * os, sys, itertools, collections, math, and others

`range()` - to crete sequence

```python

# range(start, stop)
nums = range(0,11)
nums_list = list(nums)

# range(stop)
nums = range(11)
nums_list = list(nums)

# range(start, stop, step)
even_nums = range(2, 11, 2)
even_nums_list = list(even_nums)
```

`enumerate()` = creates indexed list of objects

```python
letters = ['a', 'b', 'c', 'd']
indexed_letters = enumerate(letters)
indexed_letters_list = list(indexed_letters)
indexed_letters2 = enumerate(letters, start=5)
indexed_letters2_list = list(indexed_letters2)
```

`map()` - applies function to each element in an object

```python
nums = [1.5, 2.3, 3.4, 4.6, 5.0]
rnd_nums = map(round, nums)
list(rnd_nums)

# map() with lambda (anonymous function)

nums = [1, 2, 3, 4, 5]
sqrd_nums = map(lambda x: x**2, nums)
list(sqrd_nums)
```


In [9]:
# Create a range object that goes from 0 to 5
nums = range(6)
print(type(nums))

# Convert nums to a list
nums_list = list(nums)
print(nums_list)

# Create a new list of odd numbers from 1 to 11 by unpacking a range object
nums_list2 = [*range(1,12,2)]
print(nums_list2)

<class 'range'>
[0, 1, 2, 3, 4, 5]
[1, 3, 5, 7, 9, 11]


In [10]:
names = ['Jerry', 'Kramer', 'Elaine', 'George', 'Newman']

# Rewrite the for loop to use enumerate
indexed_names = []
for i,name in enumerate(names):
    index_name = (i,name)
    indexed_names.append(index_name) 
print(indexed_names)

# Rewrite the above for loop using list comprehension
indexed_names_comp = [(i,name) for i,name in enumerate(names)]
print(indexed_names_comp)

# Unpack an enumerate object with a starting index of one
indexed_names_unpack = [*enumerate(names, start=1)]
print(indexed_names_unpack)

[(0, 'Jerry'), (1, 'Kramer'), (2, 'Elaine'), (3, 'George'), (4, 'Newman')]
[(0, 'Jerry'), (1, 'Kramer'), (2, 'Elaine'), (3, 'George'), (4, 'Newman')]
[(1, 'Jerry'), (2, 'Kramer'), (3, 'Elaine'), (4, 'George'), (5, 'Newman')]


In [11]:
# Use map to apply str.upper to each element in names
names_map  = map(str.upper, names)

# Print the type of the names_map
print(type(names_map))

# Unpack names_map into a list
names_uppercase = [*names_map]

# Print the list created above
print(names_uppercase)

<class 'map'>
['JERRY', 'KRAMER', 'ELAINE', 'GEORGE', 'NEWMAN']


### numpy

* NumPy array - fast and memory efficient alternative to Python lists
  * np.array([1, 2, 3])
  * homogeneous - array must contain elements of the same type
  * use .dtype
  * array broadcasting - NumPy arrays vectorise operations so they are performd on all objects at once - efficient!
  * NumPy indexing - more efficient - for example returning a column in a list needs a list comprehension but in numpy: df[:,0]
  * Boolean indexing - can create a boolean mask for filtering.  for example to get negative numbers in an array: nums_np[nums_np > 0]

In [15]:
nums = np.array([[ 1,  2,  3,  4,  5],
                 [ 6,  7,  8,  9, 10]])

# Print second row of nums
print(nums[1,:])

# Print all elements of nums that are greater than six
print(nums[nums > 6])

# Double every element of nums
nums_dbl = nums * 2
print(nums_dbl)

# Replace the third column of nums
nums[:,2] = nums[:,2] + 1
print(nums)

[ 6  7  8  9 10]
[ 7  8  9 10]
[[ 2  4  6  8 10]
 [12 14 16 18 20]]
[[ 1  2  4  4  5]
 [ 6  7  9  9 10]]


In [17]:
names = ['Jerry', 'Kramer', 'Elaine', 'George', 'Newman']

# Create a list of arrival times
arrival_times = [*range(10, 51, 10)]

print(arrival_times)

# Convert arrival_times to an array and update the times
arrival_times_np = np.array(arrival_times)
new_times = arrival_times_np - 3

print(new_times)

# Use list comprehension and enumerate to pair guests to new times
guest_arrivals = [(names[i],time) for i,time in enumerate(new_times)]

print(guest_arrivals)

[10, 20, 30, 40, 50]
[ 7 17 27 37 47]
[('Jerry', 7), ('Kramer', 17), ('Elaine', 27), ('George', 37), ('Newman', 47)]


In [None]:
# Map the welcome_guest function to each (guest,time) pair
welcome_map = map(welcome_guest, guest_arrivals) # welcome_guest is a function

guest_welcomes = [*welcome_map]
print(*guest_welcomes, sep='\n')

## Timing and profiling code

Ipython magic commands
`%timeit`
%lsmagic to see magic commands list

In [2]:
%lsmagic

Available line magics:
%alias  %alias_magic  %autoawait  %autocall  %automagic  %autosave  %bookmark  %cd  %clear  %cls  %colors  %conda  %config  %connect_info  %copy  %ddir  %debug  %dhist  %dirs  %doctest_mode  %echo  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %macro  %magic  %matplotlib  %mkdir  %more  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %pip  %popd  %pprint  %precision  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %ren  %rep  %rerun  %reset  %reset_selective  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%cmd  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%markdown  %%perl  %%prun  %%pypy  %%python 

### using `%timeit`

```python
import numpy as np
rand_nums = np.random.rand(1000)

%timeit rand_nums = np.random_rand(1000)

# specify

%timeit -r2 -n10 rand_nums = np.random.rand(1000) # 2 runs, 10 executions

# single line
%timeit nums = [x for x in range(10)]

# multiple lines

%%timeit

nums = []
for x in range(10):
    nums.append(x)

# save output -o

times = %timeit -o rand_nums = np.random.rand(1000)

times.timings #all times
times.best # best time
times.worst # worst time

```


**python formal name**

* formal_list = list()
* formal_dict = dict()
* formal_tuple = tuple()

**python literal syntax**

* literal_list = []
* literal_dict = {}
* literal_tuple = ()

 ```python 
 
 #comparing literal and formal
f_time = %timeit -o formal_dict = dict()
l_time = %timeit -o literal_dict = {}

# compare
diff = (f_time.average - l_time.average) * (10**9)
print('l_time better than f_time {} ns'.format(diff))
  ```

In [3]:
# Create a list of integers (0-50) using list comprehension
nums_list_comp = [num for num in range(51)]
print(nums_list_comp)

# Create a list of integers (0-50) by unpacking range
nums_unpack = [*range(51)]
print(nums_unpack)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]


In [4]:
comp_time = %timeit nums_list_comp = [num for num in range(51)]
unpack_time = %timeit nums_unpack = [*range(51)]

798 ns ± 11.2 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
225 ns ± 6.66 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


In [5]:
# Create a list using the formal name
formal_list = list()
print(formal_list)

# Create a list using the literal syntax
literal_list = []
print(literal_list)

# Print out the type of formal_list
print(type(formal_list))

# Print out the type of literal_list
print(type(literal_list))

 #comparing literal and formal
f_time = %timeit -o formal_list = list()
l_time = %timeit -o literal_list = []

# compare
diff = (f_time.average - l_time.average) * (10**9)
print('l_time better than f_time {} ns'.format(diff))

[]
[]
<class 'list'>
<class 'list'>
35 ns ± 0.374 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
13.3 ns ± 0.112 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)
l_time better than f_time 21.695059285712304 ns


In [None]:
# comparing numpy and standard loop

%%timeit
hero_wts_lbs = []
for wt in wts:
    hero_wts_lbs.append(wt * 2.20462)


%%timeit
wts_np = np.array(wts)
hero_wts_lbs_np = wts_np * 2.20462



### code profiling

* detailed stats on frequency and duration of function calls
* line-by-line analysis
* `line_profiler` package

In [7]:
#!pip install line_profiler
#load profiler
%load_ext line_profiler
# run profiler -f for function, name of function, function call and arguments
%lprun -f convert_units hero_wts_lbs = convert_units(heroes, heights, weights)

UsageError: Could not find module convert_units.
NameError: name 'convert_units' is not defined


### code profile for memory usage

* quick and dirty: 
`import sys`
`nums_list = [*range(1000)]`
`sys.getsizeof(nums_list)`

`sys.getsizeof()` # returns size of object in bytes

**code profile memory**
* detailed stats on memory consumption
* line-by-line analyses
* `memory_profiler`
* can only be run with functions in files *.py
* will not show on small memory use
* inspects memory by querying operating system
* results may differ between platforms and runs

```python
# load
%load_ext memory_profiler

# profile
%mprun -f convert_units convert_units(heroes, hts, wts)

from hero_funcs import convert_units # hero_funcs is standalone py file
%load_ext memory_profiler
%mprun -f convert_units convert_units(heroes, hts, wts)
```

In [None]:
#!pip install memory_profiler

# Use get_publisher_heroes() to gather Star Wars heroes
star_wars_heroes = get_publisher_heroes(heroes, publishers, 'George Lucas')

print(star_wars_heroes)
print(type(star_wars_heroes))

# Use get_publisher_heroes_np() to gather Star Wars heroes
star_wars_heroes_np = get_publisher_heroes_np(heroes, publishers, 'George Lucas')

print(star_wars_heroes_np)
print(type(star_wars_heroes_np))





## Gaining Efficiencies

* combining lists
  * can use for loop and enumerate
  * better `zip_object = zip(list1, list2)`  -> `unpack_list = [*zip_object]`
* **collections** module includes: 
  * `namedtuple`: tuple subclasses with named fields
  * `deque`: list-like container with fast appends and pops
  * `Counter`: dict for counting hashable objects
  * `OrderedDict`: dict that retains order of entities
  * `defaultdict`: dict taht calls a factory function to supply missing values


**counting**

```python 

# counting with loop
poke_types = ['grass', 'dark', 'fire',...]
type_counts = {}
for poke_type in poke_types:
    if poke_type not in type_counts:
        type_counts[poke_type] = 1
    else:
        type_counts[poke_type] += 1

# using .Counter()

from collections import Counter
type_counts = Counter(poke_types)

```


**itertools**
* built-in module
* functional tools for creating and using iterators
* notable:
  * infinite iterators: `count`, `cycle`, `repeat`
  * finite iterators: `accumulate`, `chain`, `zip_longest`, etc.
  * combination generators: `product`, `permutations`, `combinations`

```python

# combine with loops
poke_types = ['grass', 'dark', 'fire',...]
combos = []

for x in poke_types:
    if x == y:
        continue
    if ((x, y) not in combos) & ((y, x) not in combos):
        combos.append((x,y))


# combine with itertools.combinations()

from itertools import combinations
combos_obj = combinations(poke_types, 2) # returns combination object
combos = [*combos_obj]

```

In [17]:
import numpy as np

names = ['Abomasnow', 'Abra', 'Absol', 'Accelgor', 'Aerodactyl', 'Aggron', 'Aipom', 'Alakazam', 'Alomomola', 'Altaria', 'Amaura', 'Ambipom', 'Amoonguss', 'Ampharos', 'Anorith', 'Arbok', 'Arcanine', 'Arceus', 'Archen', 'Archeops', 'Ariados', 'Armaldo', 'Aromatisse', 'Aron', 'Articuno', 'Audino', 'Aurorus', 'Avalugg', 'Axew', 'Azelf', 'Azumarill', 'Azurill', 'Bagon', 'Baltoy', 'Banette', 'Barbaracle', 'Barboach', 'Basculin', 'Bastiodon', 'Bayleef', 'Beartic', 'Beautifly', 'Beedrill', 'Beheeyem', 'Beldum', 'Bellossom', 'Bellsprout', 'Bergmite', 'Bibarel', 'Bidoof', 'Binacle', 'Bisharp', 'Blastoise', 'Blaziken', 'Blissey', 'Blitzle', 'Boldore', 'Bonsly', 'Bouffalant', 'Braixen', 'Braviary', 'Breloom', 'Bronzong', 'Bronzor', 'Budew', 'Buizel', 'Bulbasaur', 'Buneary', 'Bunnelby', 'Burmy', 'Butterfree', 'Cacnea', 'Cacturne', 'Camerupt', 'Carbink', 'Carnivine', 'Carracosta', 'Carvanha', 'Cascoon', 'Castform', 'Caterpie', 'Celebi', 'Chandelure', 'Chansey', 'Charizard', 'Charmander', 'Charmeleon', 'Chatot', 'Cherrim', 'Cherubi', 'Chesnaught', 'Chespin', 'Chikorita', 'Chimchar', 'Chimecho', 'Chinchou', 'Chingling', 'Cinccino', 'Clamperl', 'Clauncher', 'Clawitzer', 'Claydol', 'Clefable', 'Clefairy', 'Cleffa', 'Cloyster', 'Cobalion', 'Cofagrigus', 'Combee', 'Combusken', 'Conkeldurr', 'Corphish', 'Corsola', 'Cottonee', 'Cradily', 'Cranidos', 'Crawdaunt', 'Cresselia', 'Croagunk', 'Crobat', 'Croconaw', 'Crustle', 'Cryogonal', 'Cubchoo', 'Cubone', 'Cyndaquil', 'Darkrai', 'DarmanitanStandard Mode', 'DarmanitanZen Mode', 'Darumaka', 'Dedenne', 'Deerling', 'Deino', 'Delcatty', 'Delibird', 'Delphox', 'Dewgong', 'Dewott', 'Dialga', 'Diancie', 'Diggersby', 'Diglett', 'Ditto', 'Dodrio', 'Doduo', 'Donphan', 'Doublade', 'Dragalge', 'Dragonair', 'Dragonite', 'Drapion', 'Dratini', 'Drifblim', 'Drifloon', 'Drilbur', 'Drowzee', 'Druddigon', 'Ducklett', 'Dugtrio', 'Dunsparce', 'Duosion', 'Durant', 'Dusclops', 'Dusknoir', 'Duskull', 'Dustox', 'Dwebble', 'Eelektrik', 'Eelektross', 'Eevee', 'Ekans', 'Electabuzz', 'Electivire', 'Electrike', 'Electrode', 'Elekid', 'Elgyem', 'Emboar', 'Emolga', 'Empoleon', 'Entei', 'Escavalier', 'Espeon', 'Espurr', 'Excadrill', 'Exeggcute', 'Exeggutor', 'Exploud', "Farfetch'd", 'Fearow', 'Feebas', 'Fennekin', 'Feraligatr', 'Ferroseed', 'Ferrothorn', 'Finneon', 'Flaaffy', 'Flabébé', 'Flareon', 'Fletchinder', 'Fletchling', 'Floatzel', 'Floette', 'Florges', 'Flygon', 'Foongus', 'Forretress', 'Fraxure', 'Frillish', 'Froakie', 'Frogadier', 'Froslass', 'Furfrou', 'Furret', 'Gabite', 'Gallade', 'Galvantula', 'Garbodor', 'Garchomp', 'Gardevoir', 'Gastly', 'Gastrodon', 'Genesect', 'Gengar', 'Geodude', 'Gible', 'Gigalith', 'Girafarig', 'Glaceon', 'Glalie', 'Glameow', 'Gligar', 'Gliscor', 'Gloom', 'Gogoat', 'Golbat', 'Goldeen', 'Golduck', 'Golem', 'Golett', 'Golurk', 'Goodra', 'Goomy', 'Gorebyss', 'Gothita', 'Gothitelle', 'Gothorita', 'Granbull', 'Graveler', 'Greninja', 'Grimer', 'Grotle', 'Groudon', 'GroudonPrimal Groudon', 'Grovyle', 'Growlithe', 'Grumpig', 'Gulpin', 'Gurdurr', 'Gyarados', 'Happiny', 'Hariyama', 'Haunter', 'Hawlucha', 'Haxorus', 'Heatmor', 'Heatran', 'Heliolisk', 'Helioptile', 'Heracross', 'Herdier', 'Hippopotas', 'Hippowdon', 'Hitmonchan', 'Hitmonlee', 'Hitmontop', 'Ho-oh', 'Honchkrow', 'Honedge', 'Hoothoot', 'Hoppip', 'Horsea', 'Houndoom', 'Houndour', 'Huntail', 'Hydreigon', 'Hypno', 'Igglybuff', 'Illumise', 'Infernape', 'Inkay', 'Ivysaur', 'Jellicent', 'Jigglypuff', 'Jirachi', 'Jolteon', 'Joltik', 'Jumpluff', 'Jynx', 'Kabuto', 'Kabutops', 'Kadabra', 'Kakuna', 'Kangaskhan', 'Karrablast', 'Kecleon', 'Kingdra', 'Kingler', 'Kirlia', 'Klang', 'Klefki', 'Klink', 'Klinklang', 'Koffing', 'Krabby', 'Kricketot', 'Kricketune', 'Krokorok', 'Krookodile', 'Kyogre', 'KyogrePrimal Kyogre', 'Kyurem', 'KyuremBlack Kyurem', 'KyuremWhite Kyurem', 'Lairon', 'Lampent', 'Lanturn', 'Lapras', 'Larvesta', 'Larvitar', 'Latias', 'Latios', 'Leafeon', 'Leavanny', 'Ledian', 'Ledyba', 'Lickilicky', 'Lickitung', 'Liepard', 'Lileep', 'Lilligant', 'Lillipup', 'Linoone', 'Litleo', 'Litwick', 'Lombre', 'Lopunny', 'Lotad', 'Loudred', 'Lucario', 'Ludicolo', 'Lugia', 'Lumineon', 'Lunatone', 'Luvdisc', 'Luxio', 'Luxray', 'Machamp', 'Machoke', 'Machop', 'Magby', 'Magcargo', 'Magikarp', 'Magmar', 'Magmortar', 'Magnemite', 'Magneton', 'Magnezone', 'Makuhita', 'Malamar', 'Mamoswine', 'Manaphy', 'Mandibuzz', 'Manectric', 'Mankey', 'Mantine', 'Mantyke', 'Maractus', 'Mareep', 'Marill', 'Marowak', 'Marshtomp', 'Masquerain', 'Mawile', 'Medicham', 'Meditite', 'MeowsticFemale', 'MeowsticMale', 'Meowth', 'Mesprit', 'Metagross', 'Metang', 'Metapod', 'Mew', 'Mewtwo', 'Mienfoo', 'Mienshao', 'Mightyena', 'Milotic', 'Miltank', 'Mime Jr.', 'Minccino', 'Minun', 'Misdreavus', 'Mismagius', 'Moltres', 'Monferno', 'Mothim', 'Mr. Mime', 'Mudkip', 'Muk', 'Munchlax', 'Munna', 'Murkrow', 'Musharna', 'Natu', 'Nidoking', 'Nidoqueen', 'Nidoran♀', 'Nidoran♂', 'Nidorina', 'Nidorino', 'Nincada', 'Ninetales', 'Ninjask', 'Noctowl', 'Noibat', 'Noivern', 'Nosepass', 'Numel', 'Nuzleaf', 'Octillery', 'Oddish', 'Omanyte', 'Omastar', 'Onix', 'Oshawott', 'Pachirisu', 'Palkia', 'Palpitoad', 'Pancham', 'Pangoro', 'Panpour', 'Pansage', 'Pansear', 'Paras', 'Parasect', 'Patrat', 'Pawniard', 'Pelipper', 'Persian', 'Petilil', 'Phanpy', 'Phantump', 'Phione', 'Pichu', 'Pidgeot', 'Pidgeotto', 'Pidgey', 'Pidove', 'Pignite', 'Pikachu', 'Piloswine', 'Pineco', 'Pinsir', 'Piplup', 'Plusle', 'Politoed', 'Poliwag', 'Poliwhirl', 'Poliwrath', 'Ponyta', 'Poochyena', 'Porygon', 'Porygon-Z', 'Porygon2', 'Primeape', 'Prinplup', 'Probopass', 'Psyduck', 'Pupitar', 'Purrloin', 'Purugly', 'Pyroar', 'Quagsire', 'Quilava', 'Quilladin', 'Qwilfish', 'Raichu', 'Raikou', 'Ralts', 'Rampardos', 'Rapidash', 'Raticate', 'Rattata', 'Rayquaza', 'Regice', 'Regigigas', 'Regirock', 'Registeel', 'Relicanth', 'Remoraid', 'Reshiram', 'Reuniclus', 'Rhydon', 'Rhyhorn', 'Rhyperior', 'Riolu', 'Roggenrola', 'Roselia', 'Roserade', 'Rotom', 'RotomFan Rotom', 'RotomFrost Rotom', 'RotomHeat Rotom', 'RotomMow Rotom', 'RotomWash Rotom', 'Rufflet', 'Sableye', 'Salamence', 'Samurott', 'Sandile', 'Sandshrew', 'Sandslash', 'Sawk', 'Sawsbuck', 'Scatterbug', 'Sceptile', 'Scizor', 'Scolipede', 'Scrafty', 'Scraggy', 'Scyther', 'Seadra', 'Seaking', 'Sealeo', 'Seedot', 'Seel', 'Seismitoad', 'Sentret', 'Serperior', 'Servine', 'Seviper', 'Sewaddle', 'Sharpedo', 'Shedinja', 'Shelgon', 'Shellder', 'Shellos', 'Shelmet', 'Shieldon', 'Shiftry', 'Shinx', 'Shroomish', 'Shuckle', 'Shuppet', 'Sigilyph', 'Silcoon', 'Simipour', 'Simisage', 'Simisear', 'Skarmory', 'Skiddo', 'Skiploom', 'Skitty', 'Skorupi', 'Skrelp', 'Skuntank', 'Slaking', 'Slakoth', 'Sliggoo', 'Slowbro', 'Slowking', 'Slowpoke', 'Slugma', 'Slurpuff', 'Smeargle', 'Smoochum', 'Sneasel', 'Snivy', 'Snorlax', 'Snorunt', 'Snover', 'Snubbull', 'Solosis', 'Solrock', 'Spearow', 'Spewpa', 'Spheal', 'Spinarak', 'Spinda', 'Spiritomb', 'Spoink', 'Spritzee', 'Squirtle', 'Stantler', 'Staraptor', 'Staravia', 'Starly', 'Starmie', 'Staryu', 'Steelix', 'Stoutland', 'Stunfisk', 'Stunky', 'Sudowoodo', 'Suicune', 'Sunflora', 'Sunkern', 'Surskit', 'Swablu', 'Swadloon', 'Swalot', 'Swampert', 'Swanna', 'Swellow', 'Swinub', 'Swirlix', 'Swoobat', 'Sylveon', 'Taillow', 'Talonflame', 'Tangela', 'Tangrowth', 'Tauros', 'Teddiursa', 'Tentacool', 'Tentacruel', 'Tepig', 'Terrakion', 'Throh', 'Timburr', 'Tirtouga', 'Togekiss', 'Togepi', 'Togetic', 'Torchic', 'Torkoal', 'Torterra', 'Totodile', 'Toxicroak', 'Tranquill', 'Trapinch', 'Treecko', 'Trevenp.nant', 'Tropius', 'Trubbish', 'Turtwig', 'Tympole', 'Tynamo', 'Typhlosion', 'Tyranitar', 'Tyrantrum', 'Tyrogue', 'Tyrunt', 'Umbreon', 'Unfezant', 'Unown', 'Ursaring', 'Uxie', 'Vanillish', 'Vanillite', 'Vanilluxe', 'Vaporeon', 'Venipede', 'Venomoth', 'Venonat', 'Venusaur', 'Vespiquen', 'Vibrava', 'Victini', 'Victreebel', 'Vigoroth', 'Vileplume', 'Virizion', 'Vivillon', 'Volbeat', 'Volcanion', 'Volcarona', 'Voltorb', 'Vullaby', 'Vulpix', 'Wailmer', 'Wailord', 'Walrein', 'Wartortle', 'Watchog', 'Weavile', 'Weedle', 'Weepinbell', 'Weezing', 'Whimsicott', 'Whirlipede', 'Whiscash', 'Whismur', 'Wigglytuff', 'Wingull', 'Wobbuffet', 'Woobat', 'Wooper', 'WormadamPlant Cloak', 'WormadamSandy Cloak', 'WormadamTrash Cloak', 'Wurmple', 'Wynaut', 'Xatu', 'Xerneas', 'Yamask', 'Yanma', 'Yanmega', 'Yveltal', 'Zangoose', 'Zapdos', 'Zebstrika', 'Zekrom', 'Zigzagoon', 'Zoroark', 'Zorua', 'Zubat', 'Zweilous']

primary_types = ['Grass', 'Psychic', 'Dark', 'Bug', 'Rock', 'Steel', 'Normal', 'Psychic', 'Water', 'Dragon', 'Rock', 'Normal', 'Grass', 'Electric', 'Rock', 'Poison', 'Fire', 'Normal', 'Rock', 'Rock', 'Bug', 'Rock', 'Fairy', 'Steel', 'Ice', 'Normal', 'Rock', 'Ice', 'Dragon', 'Psychic', 'Water', 'Normal', 'Dragon', 'Ground', 'Ghost', 'Rock', 'Water', 'Water', 'Rock', 'Grass', 'Ice', 'Bug', 'Bug', 'Psychic', 'Steel', 'Grass', 'Grass', 'Ice', 'Normal', 'Normal', 'Rock', 'Dark', 'Water', 'Fire', 'Normal', 'Electric', 'Rock', 'Rock', 'Normal', 'Fire', 'Normal', 'Grass', 'Steel', 'Steel', 'Grass', 'Water', 'Grass', 'Normal', 'Normal', 'Bug', 'Bug', 'Grass', 'Grass', 'Fire', 'Rock', 'Grass', 'Water', 'Water', 'Bug', 'Normal', 'Bug', 'Psychic', 'Ghost', 'Normal', 'Fire', 'Fire', 'Fire', 'Normal', 'Grass', 'Grass', 'Grass', 'Grass', 'Grass', 'Fire', 'Psychic', 'Water', 'Psychic', 'Normal', 'Water', 'Water', 'Water', 'Ground', 'Fairy', 'Fairy', 'Fairy', 'Water', 'Steel', 'Ghost', 'Bug', 'Fire', 'Fighting', 'Water', 'Water', 'Grass', 'Rock', 'Rock', 'Water', 'Psychic', 'Poison', 'Poison', 'Water', 'Bug', 'Ice', 'Ice', 'Ground', 'Fire', 'Dark', 'Fire', 'Fire', 'Fire', 'Electric', 'Normal', 'Dark', 'Normal', 'Ice', 'Fire', 'Water', 'Water', 'Steel', 'Rock', 'Normal', 'Ground', 'Normal', 'Normal', 'Normal', 'Ground', 'Steel', 'Poison', 'Dragon', 'Dragon', 'Poison', 'Dragon', 'Ghost', 'Ghost', 'Ground', 'Psychic', 'Dragon', 'Water', 'Ground', 'Normal', 'Psychic', 'Bug', 'Ghost', 'Ghost', 'Ghost', 'Bug', 'Bug', 'Electric', 'Electric', 'Normal', 'Poison', 'Electric', 'Electric', 'Electric', 'Electric', 'Electric', 'Psychic', 'Fire', 'Electric', 'Water', 'Fire', 'Bug', 'Psychic', 'Psychic', 'Ground', 'Grass', 'Grass', 'Normal', 'Normal', 'Normal', 'Water', 'Fire', 'Water', 'Grass', 'Grass', 'Water', 'Electric', 'Fairy', 'Fire', 'Fire', 'Normal', 'Water', 'Fairy', 'Fairy', 'Ground', 'Grass', 'Bug', 'Dragon', 'Water', 'Water', 'Water', 'Ice', 'Normal', 'Normal', 'Dragon', 'Psychic', 'Bug', 'Poison', 'Dragon', 'Psychic', 'Ghost', 'Water', 'Bug', 'Ghost', 'Rock', 'Dragon', 'Rock', 'Normal', 'Ice', 'Ice', 'Normal', 'Ground', 'Ground', 'Grass', 'Grass', 'Poison', 'Water', 'Water', 'Rock', 'Ground', 'Ground', 'Dragon', 'Dragon', 'Water', 'Psychic', 'Psychic', 'Psychic', 'Fairy', 'Rock', 'Water', 'Poison', 'Grass', 'Ground', 'Ground', 'Grass', 'Fire', 'Psychic', 'Poison', 'Fighting', 'Water', 'Normal', 'Fighting', 'Ghost', 'Fighting', 'Dragon', 'Fire', 'Fire', 'Electric', 'Electric', 'Bug', 'Normal', 'Ground', 'Ground', 'Fighting', 'Fighting', 'Fighting', 'Fire', 'Dark', 'Steel', 'Normal', 'Grass', 'Water', 'Dark', 'Dark', 'Water', 'Dark', 'Psychic', 'Normal', 'Bug', 'Fire', 'Dark', 'Grass', 'Water', 'Normal', 'Steel', 'Electric', 'Bug', 'Grass', 'Ice', 'Rock', 'Rock', 'Psychic', 'Bug', 'Normal', 'Bug', 'Normal', 'Water', 'Water', 'Psychic', 'Steel', 'Steel', 'Steel', 'Steel', 'Poison', 'Water', 'Bug', 'Bug', 'Ground', 'Ground', 'Water', 'Water', 'Dragon', 'Dragon', 'Dragon', 'Steel', 'Ghost', 'Water', 'Water', 'Bug', 'Rock', 'Dragon', 'Dragon', 'Grass', 'Bug', 'Bug', 'Bug', 'Normal', 'Normal', 'Dark', 'Rock', 'Grass', 'Normal', 'Normal', 'Fire', 'Ghost', 'Water', 'Normal', 'Water', 'Normal', 'Fighting', 'Water', 'Psychic', 'Water', 'Rock', 'Water', 'Electric', 'Electric', 'Fighting', 'Fighting', 'Fighting', 'Fire', 'Fire', 'Water', 'Fire', 'Fire', 'Electric', 'Electric', 'Electric', 'Fighting', 'Dark', 'Ice', 'Water', 'Dark', 'Electric', 'Fighting', 'Water', 'Water', 'Grass', 'Electric', 'Water', 'Ground', 'Water', 'Bug', 'Steel', 'Fighting', 'Fighting', 'Psychic', 'Psychic', 'Normal', 'Psychic', 'Steel', 'Steel', 'Bug', 'Psychic', 'Psychic', 'Fighting', 'Fighting', 'Dark', 'Water', 'Normal', 'Psychic', 'Normal', 'Electric', 'Ghost', 'Ghost', 'Fire', 'Fire', 'Bug', 'Psychic', 'Water', 'Poison', 'Normal', 'Psychic', 'Dark', 'Psychic', 'Psychic', 'Poison', 'Poison', 'Poison', 'Poison', 'Poison', 'Poison', 'Bug', 'Fire', 'Bug', 'Normal', 'Flying', 'Flying', 'Rock', 'Fire', 'Grass', 'Water', 'Grass', 'Rock', 'Rock', 'Rock', 'Water', 'Electric', 'Water', 'Water', 'Fighting', 'Fighting', 'Water', 'Grass', 'Fire', 'Bug', 'Bug', 'Normal', 'Dark', 'Water', 'Normal', 'Grass', 'Ground', 'Ghost', 'Water', 'Electric', 'Normal', 'Normal', 'Normal', 'Normal', 'Fire', 'Electric', 'Ice', 'Bug', 'Bug', 'Water', 'Electric', 'Water', 'Water', 'Water', 'Water', 'Fire', 'Dark', 'Normal', 'Normal', 'Normal', 'Fighting', 'Water', 'Rock', 'Water', 'Rock', 'Dark', 'Normal', 'Fire', 'Water', 'Fire', 'Grass', 'Water', 'Electric', 'Electric', 'Psychic', 'Rock', 'Fire', 'Normal', 'Normal', 'Dragon', 'Ice', 'Normal', 'Rock', 'Steel', 'Water', 'Water', 'Dragon', 'Psychic', 'Ground', 'Ground', 'Ground', 'Fighting', 'Rock', 'Grass', 'Grass', 'Electric', 'Electric', 'Electric', 'Electric', 'Electric', 'Electric', 'Normal', 'Dark', 'Dragon', 'Water', 'Ground', 'Ground', 'Ground', 'Fighting', 'Normal', 'Bug', 'Grass', 'Bug', 'Bug', 'Dark', 'Dark', 'Bug', 'Water', 'Water', 'Ice', 'Grass', 'Water', 'Water', 'Normal', 'Grass', 'Grass', 'Poison', 'Bug', 'Water', 'Bug', 'Dragon', 'Water', 'Water', 'Bug', 'Rock', 'Grass', 'Electric', 'Grass', 'Bug', 'Ghost', 'Psychic', 'Bug', 'Water', 'Grass', 'Fire', 'Steel', 'Grass', 'Grass', 'Normal', 'Poison', 'Poison', 'Poison', 'Normal', 'Normal', 'Dragon', 'Water', 'Water', 'Water', 'Fire', 'Fairy', 'Normal', 'Ice', 'Dark', 'Grass', 'Normal', 'Ice', 'Grass', 'Fairy', 'Psychic', 'Rock', 'Normal', 'Bug', 'Ice', 'Bug', 'Normal', 'Ghost', 'Psychic', 'Fairy', 'Water', 'Normal', 'Normal', 'Normal', 'Normal', 'Water', 'Water', 'Steel', 'Normal', 'Ground', 'Poison', 'Rock', 'Water', 'Grass', 'Grass', 'Bug', 'Normal', 'Bug', 'Poison', 'Water', 'Water', 'Normal', 'Ice', 'Fairy', 'Psychic', 'Fairy', 'Normal', 'Fire', 'Grass', 'Grass', 'Normal', 'Normal', 'Water', 'Water', 'Fire', 'Rock', 'Fighting', 'Fighting', 'Water', 'Fairy', 'Fairy', 'Fairy', 'Fire', 'Fire', 'Grass', 'Water', 'Poison', 'Normal', 'Ground', 'Grass', 'Ghost', 'Grass', 'Poison', 'Grass', 'Water', 'Electric', 'Fire', 'Rock', 'Rock', 'Fighting', 'Rock', 'Dark', 'Normal', 'Psychic', 'Normal', 'Psychic', 'Ice', 'Ice', 'Ice', 'Water', 'Bug', 'Bug', 'Bug', 'Grass', 'Bug', 'Ground', 'Psychic', 'Grass', 'Normal', 'Grass', 'Grass', 'Bug', 'Bug', 'Fire', 'Bug', 'Electric', 'Dark', 'Fire', 'Water', 'Water', 'Ice', 'Water', 'Normal', 'Dark', 'Bug', 'Grass', 'Poison', 'Grass', 'Bug', 'Water', 'Normal', 'Normal', 'Water', 'Psychic', 'Psychic', 'Water', 'Bug', 'Bug', 'Bug', 'Bug', 'Psychic', 'Psychic', 'Fairy', 'Ghost', 'Bug', 'Bug', 'Dark', 'Normal', 'Electric', 'Electric', 'Dragon', 'Normal', 'Dark', 'Dark', 'Poison', 'Dark']

secondary_types = ['Ice', np.nan, np.nan, np.nan, 'Flying', 'Rock', np.nan, np.nan, np.nan, 'Flying', 'Ice', np.nan, 'Poison', np.nan, 'Bug', np.nan, np.nan, np.nan, 'Flying', 'Flying', 'Poison', 'Bug', np.nan, 'Rock', 'Flying', np.nan, 'Ice', np.nan, np.nan, np.nan, 'Fairy', 'Fairy', np.nan, 'Psychic', np.nan, 'Water', 'Ground', np.nan, 'Steel', np.nan, np.nan, 'Flying', 'Poison', np.nan, 'Psychic', np.nan, 'Poison', np.nan, 'Water', np.nan, 'Water', 'Steel', np.nan, 'Fighting', np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 'Flying', 'Fighting', 'Psychic', 'Psychic', 'Poison', np.nan, 'Poison', np.nan, np.nan, np.nan, 'Flying', np.nan, 'Dark', 'Ground', 'Fairy', np.nan, 'Rock', 'Dark', np.nan, np.nan, np.nan, 'Grass', 'Fire', np.nan, 'Flying', np.nan, np.nan, 'Flying', np.nan, np.nan, 'Fighting', np.nan, np.nan, np.nan, np.nan, 'Electric', np.nan, np.nan, np.nan, np.nan, np.nan, 'Psychic', np.nan, np.nan, np.nan, 'Ice', 'Fighting', np.nan, 'Flying', 'Fighting', np.nan, np.nan, 'Rock', 'Fairy', 'Grass', np.nan, 'Dark', np.nan, 'Fighting', 'Flying', np.nan, 'Rock', np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 'Psychic', np.nan, 'Fairy', 'Grass', 'Dragon', np.nan, 'Flying', 'Psychic', 'Ice', np.nan, 'Dragon', 'Fairy', 'Ground', np.nan, np.nan, 'Flying', 'Flying', np.nan, 'Ghost', 'Dragon', np.nan, 'Flying', 'Dark', np.nan, 'Flying', 'Flying', np.nan, np.nan, np.nan, 'Flying', np.nan, np.nan, np.nan, 'Steel', np.nan, np.nan, np.nan, 'Poison', 'Rock', np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 'Fighting', 'Flying', 'Steel', np.nan, 'Steel', np.nan, np.nan, 'Steel', 'Psychic', 'Psychic', np.nan, 'Flying', 'Flying', np.nan, np.nan, np.nan, 'Steel', 'Steel', np.nan, np.nan, np.nan, np.nan, 'Flying', 'Flying', np.nan, np.nan, np.nan, 'Dragon', 'Poison', 'Steel', np.nan, 'Ghost', np.nan, np.nan, 'Ghost', np.nan, np.nan, 'Ground', 'Fighting', 'Electric', np.nan, 'Ground', 'Fairy', 'Poison', 'Ground', 'Steel', 'Poison', 'Ground', 'Ground', np.nan, 'Psychic', np.nan, np.nan, np.nan, 'Flying', 'Flying', 'Poison', np.nan, 'Flying', np.nan, np.nan, 'Ground', 'Ghost', 'Ghost', np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 'Ground', 'Dark', np.nan, np.nan, np.nan, 'Fire', np.nan, np.nan, np.nan, np.nan, np.nan, 'Flying', np.nan, np.nan, 'Poison', 'Flying', np.nan, np.nan, 'Steel', 'Normal', 'Normal', 'Fighting', np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 'Flying', 'Flying', 'Ghost', 'Flying', 'Flying', np.nan, 'Fire', 'Fire', np.nan, 'Dragon', np.nan, 'Fairy', np.nan, 'Fighting', 'Psychic', 'Poison', 'Ghost', 'Fairy', 'Psychic', np.nan, 'Electric', 'Flying', 'Psychic', 'Water', 'Water', np.nan, 'Poison', np.nan, np.nan, np.nan, 'Dragon', np.nan, 'Fairy', np.nan, 'Fairy', np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 'Dark', 'Dark', np.nan, np.nan, 'Ice', 'Ice', 'Ice', 'Rock', 'Fire', 'Electric', 'Ice', 'Fire', 'Ground', 'Psychic', 'Psychic', np.nan, 'Grass', 'Flying', 'Flying', np.nan, np.nan, np.nan, 'Grass', np.nan, np.nan, np.nan, 'Normal', 'Fire', 'Grass', np.nan, 'Grass', np.nan, 'Steel', 'Grass', 'Flying', np.nan, 'Psychic', np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 'Rock', np.nan, np.nan, np.nan, 'Steel', 'Steel', 'Steel', np.nan, 'Psychic', 'Ground', np.nan, 'Flying', np.nan, np.nan, 'Flying', 'Flying', np.nan, np.nan, 'Fairy', np.nan, 'Ground', 'Flying', 'Fairy', 'Psychic', 'Psychic', np.nan, np.nan, np.nan, np.nan, 'Psychic', 'Psychic', np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 'Fairy', np.nan, np.nan, np.nan, np.nan, 'Flying', 'Fighting', 'Flying', 'Fairy', np.nan, np.nan, np.nan, np.nan, 'Flying', np.nan, 'Flying', 'Ground', 'Ground', np.nan, np.nan, np.nan, np.nan, 'Ground', np.nan, 'Flying', 'Flying', 'Dragon', 'Dragon', np.nan, 'Ground', 'Dark', np.nan, 'Poison', 'Water', 'Water', 'Ground', np.nan, np.nan, 'Dragon', 'Ground', np.nan, 'Dark', np.nan, np.nan, np.nan, 'Grass', 'Grass', np.nan, 'Steel', 'Flying', np.nan, np.nan, np.nan, 'Grass', np.nan, np.nan, 'Flying', 'Flying', 'Flying', 'Flying', 'Fighting', np.nan, 'Ground', np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 'Fighting', np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 'Steel', np.nan, 'Ground', np.nan, np.nan, 'Normal', 'Ground', np.nan, np.nan, 'Poison', np.nan, np.nan, 'Fairy', np.nan, np.nan, np.nan, np.nan, 'Flying', np.nan, np.nan, np.nan, np.nan, 'Rock', np.nan, 'Fire', np.nan, 'Rock', 'Rock', 'Rock', np.nan, np.nan, 'Poison', 'Poison', 'Ghost', 'Flying', 'Ice', 'Fire', 'Grass', 'Water', 'Flying', 'Ghost', 'Flying', np.nan, 'Dark', np.nan, np.nan, np.nan, 'Grass', np.nan, np.nan, 'Steel', 'Poison', 'Fighting', 'Fighting', 'Flying', np.nan, np.nan, 'Water', np.nan, np.nan, 'Ground', np.nan, np.nan, np.nan, np.nan, 'Grass', 'Dark', 'Ghost', np.nan, np.nan, np.nan, np.nan, 'Steel', 'Dark', np.nan, np.nan, 'Rock', np.nan, 'Flying', np.nan, np.nan, np.nan, np.nan, 'Flying', np.nan, 'Flying', np.nan, 'Bug', 'Water', 'Dark', np.nan, np.nan, np.nan, 'Psychic', 'Psychic', 'Psychic', np.nan, np.nan, np.nan, 'Psychic', 'Ice', np.nan, np.nan, np.nan, 'Ice', np.nan, np.nan, 'Psychic', 'Flying', np.nan, 'Water', 'Poison', np.nan, 'Dark', np.nan, np.nan, np.nan, np.nan, 'Flying', 'Flying', 'Flying', 'Psychic', np.nan, 'Ground', np.nan, 'Electric', 'Dark', np.nan, np.nan, np.nan, np.nan, 'Water', 'Flying', 'Grass', np.nan, 'Ground', 'Flying', 'Flying', 'Ground', np.nan, 'Flying', np.nan, 'Flying', 'Flying', np.nan, np.nan, np.nan, np.nan, 'Poison', 'Poison', np.nan, 'Fighting', np.nan, np.nan, 'Rock', 'Flying', np.nan, 'Flying', np.nan, np.nan, 'Ground', np.nan, 'Fighting', 'Flying', np.nan, np.nan, 'Grass', 'Flying', np.nan, np.nan, np.nan, np.nan, np.nan, 'Dark', 'Dragon', np.nan, 'Dragon', np.nan, 'Flying', np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 'Poison', 'Poison', 'Poison', 'Poison', 'Flying', 'Dragon', 'Fire', 'Poison', np.nan, 'Poison', 'Fighting', 'Flying', np.nan, 'Water', 'Fire', np.nan, 'Flying', np.nan, np.nan, np.nan, 'Water', np.nan, np.nan, 'Ice', 'Poison', 'Poison', np.nan, 'Fairy', 'Poison', 'Ground', np.nan, 'Fairy', 'Flying', np.nan, 'Flying', 'Ground', 'Grass', 'Ground', 'Steel', np.nan, np.nan, 'Flying', np.nan, np.nan, 'Flying', 'Flying', 'Flying', np.nan, 'Flying', np.nan, 'Electric', np.nan, np.nan, np.nan, 'Flying', 'Dragon']

generations = [1, 1, 1, 5, 3, 5, 1, 6, 1, 6, 5, 5, 4, 6, 3, 4, 2, 5, 2, 5, 4, 1, 1, 2, 6, 5, 5, 6, 6, 1, 4, 5, 6, 2, 6, 1, 3, 2, 4, 1, 5, 3, 5, 5, 1, 5, 5, 5, 5, 6, 1, 3, 4, 6, 1, 4, 5, 3, 5, 5, 1, 4, 1, 1, 5, 6, 5, 1, 1, 6, 5, 5, 4, 6, 1, 1, 4, 5, 4, 5, 6, 2, 3, 5, 6, 5, 3, 4, 5, 1, 5, 6, 1, 1, 2, 3, 3, 3, 4, 4, 1, 3, 6, 3, 5, 3, 5, 3, 3, 1, 3, 6, 4, 4, 4, 5, 3, 4, 4, 3, 5, 5, 3, 5, 4, 1, 1, 3, 5, 3, 2, 5, 4, 3, 2, 4, 3, 5, 3, 1, 2, 4, 3, 5, 3, 5, 4, 1, 2, 4, 3, 5, 5, 1, 4, 6, 3, 6, 3, 4, 1, 5, 6, 1, 5, 4, 4, 3, 3, 5, 2, 3, 1, 6, 5, 1, 5, 4, 3, 6, 1, 3, 3, 6, 4, 3, 5, 4, 2, 4, 4, 1, 2, 5, 1, 3, 6, 4, 1, 1, 1, 1, 2, 4, 1, 1, 4, 4, 5, 3, 1, 4, 5, 3, 4, 1, 3, 4, 2, 5, 3, 4, 1, 1, 1, 5, 1, 4, 4, 3, 4, 3, 5, 3, 2, 3, 3, 3, 2, 4, 1, 3, 4, 2, 6, 5, 2, 5, 5, 1, 1, 1, 5, 4, 2, 4, 2, 2, 5, 5, 5, 4, 2, 3, 3, 5, 4, 5, 6, 3, 1, 2, 4, 2, 5, 1, 4, 3, 1, 1, 1, 1, 3, 5, 1, 3, 3, 3, 3, 5, 2, 5, 4, 2, 2, 3, 6, 4, 2, 1, 2, 5, 5, 3, 1, 3, 5, 5, 5, 5, 6, 5, 1, 5, 1, 5, 5, 1, 6, 4, 3, 1, 6, 5, 1, 4, 6, 5, 1, 2, 5, 5, 3, 1, 5, 5, 3, 3, 3, 6, 1, 5, 1, 3, 3, 4, 5, 3, 1, 1, 1, 6, 3, 3, 4, 5, 3, 5, 4, 5, 1, 5, 5, 3, 5, 6, 5, 4, 5, 6, 1, 1, 4, 1, 3, 4, 1, 4, 1, 1, 5, 4, 4, 5, 5, 6, 1, 1, 2, 4, 2, 5, 2, 5, 5, 6, 1, 3, 3, 5, 6, 4, 3, 3, 1, 5, 5, 2, 3, 5, 3, 6, 3, 5, 2, 1, 2, 5, 2, 3, 3, 1, 3, 1, 1, 4, 3, 1, 3, 5, 1, 3, 6, 4, 5, 2, 2, 6, 2, 2, 5, 6, 2, 1, 3, 5, 3, 3, 5, 5, 5, 3, 5, 2, 2, 4, 4, 3, 5, 6, 1, 4, 1, 1, 4, 2, 3, 4, 5, 2, 4, 3, 3, 4, 3, 3, 3, 3, 1, 2, 3, 5, 5, 6, 4, 3, 5, 5, 5, 6, 1, 2, 3, 1, 5, 6, 2, 1, 3, 5]

pokemon = ['Geodude', 'Cubone', 'Lickitung', 'Persian', 'Diglett']

In [9]:
# Combine names and primary_types
names_type1 = [*zip(names, primary_types)]

# print the first 5 elements in names_type1
print(*names_type1[:5], sep='\n')

('Abomasnow', 'Grass')
('Abra', 'Psychic')
('Absol', 'Dark')
('Accelgor', 'Bug')
('Aerodactyl', 'Rock')


In [10]:
# Combine all three lists together
names_types = [*zip(names, primary_types, secondary_types)]

print(*names_types[:5], sep='\n')

('Abomasnow', 'Grass', 'Ice')
('Abra', 'Psychic', nan)
('Absol', 'Dark', nan)
('Accelgor', 'Bug', nan)
('Aerodactyl', 'Rock', 'Flying')


In [11]:
# Combine five items from names and three items from primary_types
differing_lengths = [*zip(names[:5], primary_types[:3])]

print(*differing_lengths, sep='\n')

('Abomasnow', 'Grass')
('Abra', 'Psychic')
('Absol', 'Dark')


In [16]:
from collections import Counter

# Collect the count of primary types
type_count = Counter(primary_types)
print(type_count, '\n')

# Collect the count of generations
gen_count = Counter(generations)
print(gen_count, '\n')

# Use list comprehension to get each Pokémon's starting letter
starting_letters = [name[0] for name in names]

# Collect the count of Pokémon for each starting_letter
starting_letters_count = Counter(starting_letters)
print(starting_letters_count)

Counter({'Water': 105, 'Normal': 92, 'Bug': 65, 'Grass': 64, 'Fire': 48, 'Psychic': 46, 'Rock': 41, 'Electric': 40, 'Ground': 30, 'Dark': 28, 'Poison': 28, 'Dragon': 25, 'Fighting': 25, 'Ice': 23, 'Steel': 21, 'Ghost': 20, 'Fairy': 17, 'Flying': 2}) 

Counter({5: 122, 3: 103, 1: 99, 4: 78, 2: 51, 6: 47}) 

Counter({'S': 102, 'M': 58, 'C': 55, 'P': 47, 'G': 46, 'D': 41, 'B': 39, 'T': 35, 'L': 33, 'A': 32, 'R': 30, 'H': 27, 'F': 26, 'K': 25, 'W': 23, 'V': 22, 'E': 21, 'N': 16, 'Z': 9, 'J': 7, 'O': 6, 'I': 5, 'U': 5, 'Q': 4, 'Y': 4, 'X': 2})


### set theory 

* `set` datatype methods: 
  * `intersection()`: all elements that are in both sets
  * `difference()`: all elements in one set but not the other
  * `symmetric_difference()`: all elements in exactly one set
  * `union()`: all elements that are in either set

```python

# copmaring ojects with loops - very inefficient

list_a = ['bulbasaur', 'charmander', 'squirtle']
list_b = ['caterpie', 'pidgey', 'squirtle']

in_common = []

for pokemon_a in list_a:
    for pokemon_b in list_b:
        if pokemon_a == pokemon_b:
            in_common.append(pokemon_a)


# set data type

set_a = set(list_a)
seb_b = set(list_b)

set_a.intersection(set_b) # in common
set_a.difference(set_b) # in set a not in b
set_a.symmetric_difference(set_b) # elements in exactly one list but not both
set_a.union(set_b) # collects unique elements across sets

# membership in set is much faster than in lit or tuple
%timeit 'Zubat' in names_set

# sets - unique elements

unique_types _set = set(primary_types)


```

In [18]:
ash_pokedex = ['Pikachu', 'Bulbasaur', 'Koffing', 'Spearow', 'Vulpix', 'Wigglytuff', 'Zubat', 'Rattata', 'Psyduck', 'Squirtle']
misty_pokedex = ['Krabby', 'Horsea', 'Slowbro', 'Tentacool', 'Vaporeon', 'Magikarp', 'Poliwag', 'Starmie', 'Psyduck', 'Squirtle']

# Convert both lists to sets
ash_set = set(ash_pokedex)
misty_set = set(misty_pokedex)

# Find the Pokémon that exist in both sets
both = ash_set.intersection(misty_set)
print(both)

# Find the Pokémon that Ash has and Misty does not have
ash_only = ash_set.difference(misty_set)
print(ash_only)

# Find the Pokémon that are in only one set (not both)
unique_to_set = ash_set.symmetric_difference(misty_set)
print(unique_to_set)

{'Psyduck', 'Squirtle'}
{'Pikachu', 'Koffing', 'Spearow', 'Rattata', 'Bulbasaur', 'Wigglytuff', 'Vulpix', 'Zubat'}
{'Magikarp', 'Vaporeon', 'Starmie', 'Pikachu', 'Krabby', 'Koffing', 'Spearow', 'Slowbro', 'Rattata', 'Tentacool', 'Horsea', 'Bulbasaur', 'Wigglytuff', 'Vulpix', 'Poliwag', 'Zubat'}


In [19]:
brock_pokedex = ['Onix', 'Geodude', 'Zubat', 'Golem', 'Vulpix', 'Tauros', 'Kabutops', 'Omastar', 'Machop', 'Dugtrio']

# Convert Brock's Pokédex to a set
brock_pokedex_set = set(brock_pokedex)
print(brock_pokedex_set)

# Check if Psyduck is in Ash's list and Brock's set
print('Psyduck' in ash_pokedex)
print('Psyduck' in brock_pokedex_set)

# Check if Machop is in Ash's list and Brock's set
print('Machop' in ash_pokedex)
print('Machop' in brock_pokedex_set)

{'Geodude', 'Machop', 'Omastar', 'Dugtrio', 'Kabutops', 'Onix', 'Golem', 'Vulpix', 'Zubat', 'Tauros'}
True
False
False
True


In [20]:
%timeit 'Psyduck' in ash_pokedex
%timeit 'Psyduck' in brock_pokedex_set
%timeit 'Machop' in ash_pokedex
%timeit 'Machop' in brock_pokedex_set

73 ns ± 1.3 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
22.6 ns ± 0.499 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
82 ns ± 2.27 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
21.8 ns ± 0.277 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [21]:
def find_unique_items(data):
    uniques = []

    for item in data:
        if item not in uniques:
            uniques.append(item)

    return uniques

# Use the provided function to collect unique Pokémon names
uniq_names_func = find_unique_items(names)
print(len(uniq_names_func))

# Use find_unique_items() to collect unique Pokémon names
uniq_names_func = find_unique_items(names)
print(len(uniq_names_func))

# Convert the names list to a set to collect unique Pokémon names
uniq_names_set = set(names)
print(len(uniq_names_set))

# Check that both unique collections are equivalent
print(sorted(uniq_names_func) == sorted(uniq_names_set))

720
720
720
True


In [25]:
%timeit find_unique_items(names)
%timeit set(names)

1.91 ms ± 83.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
5.49 µs ± 163 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [26]:
# Use the best approach to collect unique primary types and generations
uniq_types = set(primary_types) 
uniq_gens = set(generations)
print(uniq_types, uniq_gens, sep='\n') 

{'Ground', 'Ghost', 'Poison', 'Fairy', 'Steel', 'Fighting', 'Flying', 'Water', 'Psychic', 'Bug', 'Electric', 'Fire', 'Normal', 'Rock', 'Grass', 'Ice', 'Dragon', 'Dark'}
{1, 2, 3, 4, 5, 6}


### eliminating loops

**looping patterns** - costly
* `for` loop: iterate over sequence piece by piece
* `while` loop: repeat loop as long as condition is met
* nexted loops: use one loop inside another loop

**benefits**
* fewer lines of code
* better code readability
  * 'flat is better than nested'
* efficiency gains

```python

poke_stats = [
    [90, 92, 75, 60],
    [25, 20, 15, 90],
    [65, 130, 60, 75],
    ...
]


# loop approcah

totals = []
for row in poke_stats:
    totals.append(sum(row))

#list comprehension
totals_comp = [sum(row) for row in poke_stats]

# built in map() function
totals_map = [*map(sum, poke_stats)]

```


using numpy to perform calculations on entire array at once

```python

import numpy as np

poke_stats = np.array([
    [90, 92, 75, 60],
    [25, 20, 15, 90],
    [65, 130, 60, 75],
    ...
])

avgs_np = poke_stats.mean(axis=1) # mean of each row calculated on array - super efficient

```

In [33]:
gen1_gen2_name_lengths_loop = []

for name,gen in zip(names, generations):
    if gen < 3:
        name_length = len(name)
        poke_tuple = (name, name_length)
        gen1_gen2_name_lengths_loop.append(poke_tuple)
print(gen1_gen2_name_lengths_loop[:5])

[('Abomasnow', 9), ('Abra', 4), ('Absol', 5), ('Aipom', 5), ('Alomomola', 9)]


In [36]:
# Collect Pokémon that belong to generation 1 or generation 2
gen1_gen2_pokemon = [name for name,gen in zip(names, generations) if gen < 3]

# Create a map object that stores the name lengths of gen1_gen2_pokemon
name_lengths_map = map(len, gen1_gen2_pokemon)

# Combine gen1_gen2_pokemon and name_lengths_map into a list
gen1_gen2_name_lengths = [*zip(gen1_gen2_pokemon, name_lengths_map)]

print(gen1_gen2_name_lengths[:5])

[('Abomasnow', 9), ('Abra', 4), ('Absol', 5), ('Aipom', 5), ('Alomomola', 9)]


In [37]:
g1_g2_name_lengths = [(name, len(name)) for name,gen in zip(names, generations) if gen < 3]
print(g1_g2_name_lengths[:5])

[('Abomasnow', 9), ('Abra', 4), ('Absol', 5), ('Aipom', 5), ('Alomomola', 9)]


In [None]:
# Create a total stats array
total_stats_np = stats.sum(axis=1)

# Create an average stats array
avg_stats_np = stats.mean(axis=1)

# Combine names, total_stats_np, and avg_stats_np into a list
poke_list_np = [*zip(names, total_stats_np, avg_stats_np)]

print(poke_list_np == poke_list, '\n')
print(poke_list_np[:3])
print(poke_list[:3], '\n')
top_3 = sorted(poke_list_np, key=lambda x: x[1], reverse=True)[:3]
print('3 strongest Pokémon:\n{}'.format(top_3))

### better loops (if they are needed)

* understand what is being done with each loop iteration
* move one-time calculations outside (above) the loop
* use holistic conversions outside (below) the loop
* anything that can be done once should be outside the loop

```python
import numpy as np

names = ['absol', 'aron', 'jynx', 'natu', 'onix']
attacks = np.array([130, 70, 50, 50, 45])
for pokemon, attack in zip(names, attacks):
    total_attack_avg = attacks.mean() # this is being calculated each time - but should be outside (above) loop
    if attack > total_attack_avg:
        print(
            "{}'s attack: {} > average: {}!"
            .format(pokemon, attack, total_attack_avg)
        )

## holistic conversions

names = ['pikachu', 'squirtle'...]
legend_status = [False, False, True...]
generations = [1, 1, 1, ...]
poke_data = []

for poke_tuple in zip(names, legend_status, generations):
    poke_list = list(poke_tuple)
    poke_data.append(poke_list)
print(poke_data)

## more efficient:

poke_data_tuples = []
for poke_tuple in zip(names, legend_status, generations):
    poke_data_tuples.append(poke_tuple)
poke_data = [*map(list, poke_data_tuples)] # moved out and uses map 
print(poke_data)


```

In [38]:
# Import Counter
from collections import Counter

# Collect the count of each generation
gen_counts = Counter(generations)

# Improve for loop by moving one calculation above the loop
total_count = len(generations)

for gen,count in gen_counts.items():
    gen_percent = round(count / total_count *100, 2)
    print('generation {}: count = {:3} percentage = {}'
          .format(gen, count, gen_percent))

generation 1: count =  99 percentage = 19.8
generation 5: count = 122 percentage = 24.4
generation 3: count = 103 percentage = 20.6
generation 6: count =  47 percentage = 9.4
generation 4: count =  78 percentage = 15.6
generation 2: count =  51 percentage = 10.2


In [None]:
from itertools import combinations

# Collect all possible pairs using combinations()
possible_pairs = [*combinations(pokemon_types, 2)]

# Create an empty list called enumerated_tuples
enumerated_tuples = []

# Append each enumerated_pair_tuple to the empty list above
for i,pair in enumerate(possible_pairs, 1):
    enumerated_pair_tuple = (i,) + pair
    enumerated_tuples.append(enumerated_pair_tuple)

# Convert all tuples in enumerated_tuples to a list
enumerated_pairs = [*map(list, enumerated_tuples)]
print(enumerated_pairs)

In [None]:
# Calculate the total HP avg and total HP standard deviation
hp_avg = hps.mean()
hp_std = hps.std()

# Use NumPy to eliminate the previous for loop
z_scores = (hps - hp_avg)/hp_std

# Combine names, hps, and z_scores
poke_zscores2 = [*zip(names, hps, z_scores)]
print(*poke_zscores2[:3], sep='\n')

# Use list comprehension with the same logic as the highest_hp_pokemon code block
highest_hp_pokemon2 = [(names, hps, z_scores) for names,hps,z_scores in poke_zscores2 if z_scores > 2]
print(*highest_hp_pokemon2, sep='\n')

## Basic Pandas optimisations

### iterrows()

* iterating with `.iterrows()`
  * returns each DataFrame row as a tuple of (index, pandas Series) pairs

```python

win_perc_list = []

for i, row in baseball_df.iterrows():
    wins = row['W']
    games_played = row['G']

    win_perc = calc_win_perc(wins, games_played)

    win_perc_list.append(win_perc)

baseball_df['WP'] = win_perc_list


``


In [6]:
# import baseball.csv with pandas
import pandas as pd
baseball = pd.read_csv('baseball.csv')

# print Pittsburgh baseball data
#print(baseball[baseball['Team']=='PIT'])

# Pittsburgh 2008-2012
pit = baseball[(baseball['Year'] >= 2008) & (baseball['Year'] <= 2012) & (baseball['Team'] == 'PIT')]
#print(pit_df)

# keep cols Team, League, Year, RS, RA, W, G, Playoffs
cols = ['Team', 'League', 'Year', 'RS', 'RA', 'W', 'G', 'Playoffs']
pit_df = pit[cols]
#reset index
pit_df = pit_df.reset_index(drop=True)
print(pit_df)

  Team League  Year   RS   RA   W    G  Playoffs
0  PIT     NL  2012  651  674  79  162         0
1  PIT     NL  2011  610  712  72  162         0
2  PIT     NL  2010  587  866  57  162         0
3  PIT     NL  2009  636  768  62  161         0
4  PIT     NL  2008  735  884  67  162         0


In [7]:
# Iterate over pit_df and print each index variable, row, and row type
for i,row in pit_df.iterrows():
    print(i)
    print(row)
    print(type(row))

# Use one variable instead of two to store the result of .iterrows()
for row_tuple in pit_df.iterrows():
    print(row_tuple)

# Print the row and type of each row
for row_tuple in pit_df.iterrows():
    print(row_tuple)
    print(type(row_tuple))


0
Team         PIT
League        NL
Year        2012
RS           651
RA           674
W             79
G            162
Playoffs       0
Name: 0, dtype: object
<class 'pandas.core.series.Series'>
1
Team         PIT
League        NL
Year        2011
RS           610
RA           712
W             72
G            162
Playoffs       0
Name: 1, dtype: object
<class 'pandas.core.series.Series'>
2
Team         PIT
League        NL
Year        2010
RS           587
RA           866
W             57
G            162
Playoffs       0
Name: 2, dtype: object
<class 'pandas.core.series.Series'>
3
Team         PIT
League        NL
Year        2009
RS           636
RA           768
W             62
G            161
Playoffs       0
Name: 3, dtype: object
<class 'pandas.core.series.Series'>
4
Team         PIT
League        NL
Year        2008
RS           735
RA           884
W             67
G            162
Playoffs       0
Name: 4, dtype: object
<class 'pandas.core.series.Series'>
(0, Team       

In [8]:
def calc_run_diff(runs_scored, runs_allowed):

    run_diff = runs_scored - runs_allowed

    return run_diff

# create Giants df 2008-2012
giants = baseball[(baseball['Year'] >= 2008) & (baseball['Year'] <= 2012) & (baseball['Team'] == 'SFG')]
# keep cols Team, League, Year, RS, RA, W, G, Playoffs
cols = ['Team', 'League', 'Year', 'RS', 'RA', 'W', 'G', 'Playoffs']
giants_df = giants[cols].reset_index(drop=True)
print(giants_df)

  Team League  Year   RS   RA   W    G  Playoffs
0  SFG     NL  2012  718  649  94  162         1
1  SFG     NL  2011  570  578  86  162         0
2  SFG     NL  2010  697  583  92  162         1
3  SFG     NL  2009  657  611  88  162         0
4  SFG     NL  2008  640  759  72  162         0


In [9]:
# Create an empty list to store run differentials
run_diffs = []

# Write a for loop and collect runs allowed and runs scored for each row
for i,row in giants_df.iterrows():
    runs_scored = row['RS']
    runs_allowed = row['RA']

    # Use the provided function to calculate run_diff for each row
    run_diff = calc_run_diff(runs_scored, runs_allowed)
    # Append each run differential to the output list
    run_diffs.append(run_diff)

giants_df['RD'] = run_diffs
print(giants_df)

  Team League  Year   RS   RA   W    G  Playoffs   RD
0  SFG     NL  2012  718  649  94  162         1   69
1  SFG     NL  2011  570  578  86  162         0   -8
2  SFG     NL  2010  697  583  92  162         1  114
3  SFG     NL  2009  657  611  88  162         0   46
4  SFG     NL  2008  640  759  72  162         0 -119


### itertuples()

can access items with .method in the named tuple

```python 

for row_tuple in team_wins_df.iterrows():
    print(row_tuple[1]['Team'])

# will give an error - namedtuples do not suppor square brakets like a pandas series
for row_namedtuple in team_wins_df.itertuples():
    print(row_tuple['Team'])

# use .method instead
for row_namedtuple in team_wins_df.itertuples():
    print(row_tuple.Team)
```

In [11]:
## create rangers df
rangers = baseball[(baseball['Team'] == 'TEX')]
# keep cols Team, League, Year, RS, RA, W, G, Playoffs
cols = ['Team', 'League', 'Year', 'RS', 'RA', 'W', 'G', 'Playoffs']
rangers_df = rangers[cols].reset_index(drop=True)
print(rangers_df)

   Team League  Year   RS   RA   W    G  Playoffs
0   TEX     AL  2012  808  707  93  162         1
1   TEX     AL  2011  855  677  96  162         1
2   TEX     AL  2010  787  687  90  162         1
3   TEX     AL  2009  784  740  87  162         0
4   TEX     AL  2008  901  967  79  162         0
5   TEX     AL  2007  816  844  75  162         0
6   TEX     AL  2006  835  784  80  162         0
7   TEX     AL  2005  865  858  79  162         0
8   TEX     AL  2004  860  794  89  162         0
9   TEX     AL  2003  826  969  71  162         0
10  TEX     AL  2002  843  882  72  162         0
11  TEX     AL  2001  890  968  73  162         0
12  TEX     AL  2000  848  974  71  162         0
13  TEX     AL  1999  945  859  95  162         1
14  TEX     AL  1998  940  871  88  162         1
15  TEX     AL  1997  807  823  77  162         0
16  TEX     AL  1996  928  799  90  163         1
17  TEX     AL  1993  835  751  86  162         0
18  TEX     AL  1992  682  753  77  162         0


In [12]:
# Loop over the DataFrame and print each row
for row_namedtuple in rangers_df.itertuples():
  print(row_namedtuple)

Pandas(Index=0, Team='TEX', League='AL', Year=2012, RS=808, RA=707, W=93, G=162, Playoffs=1)
Pandas(Index=1, Team='TEX', League='AL', Year=2011, RS=855, RA=677, W=96, G=162, Playoffs=1)
Pandas(Index=2, Team='TEX', League='AL', Year=2010, RS=787, RA=687, W=90, G=162, Playoffs=1)
Pandas(Index=3, Team='TEX', League='AL', Year=2009, RS=784, RA=740, W=87, G=162, Playoffs=0)
Pandas(Index=4, Team='TEX', League='AL', Year=2008, RS=901, RA=967, W=79, G=162, Playoffs=0)
Pandas(Index=5, Team='TEX', League='AL', Year=2007, RS=816, RA=844, W=75, G=162, Playoffs=0)
Pandas(Index=6, Team='TEX', League='AL', Year=2006, RS=835, RA=784, W=80, G=162, Playoffs=0)
Pandas(Index=7, Team='TEX', League='AL', Year=2005, RS=865, RA=858, W=79, G=162, Playoffs=0)
Pandas(Index=8, Team='TEX', League='AL', Year=2004, RS=860, RA=794, W=89, G=162, Playoffs=0)
Pandas(Index=9, Team='TEX', League='AL', Year=2003, RS=826, RA=969, W=71, G=162, Playoffs=0)
Pandas(Index=10, Team='TEX', League='AL', Year=2002, RS=843, RA=882, W

In [14]:
# Loop over the DataFrame and print each row's Index, Year and Wins (W)
for row in rangers_df.itertuples():
  i = row.Index
  year = row.Year
  wins = row.W
  #print(i, year, wins)
  # Check if rangers made Playoffs (1 means yes; 0 means no)
  if row.Playoffs == 1:
    print(i, year, wins)

0 2012 93
1 2011 96
2 2010 90
13 1999 95
14 1998 88
16 1996 90


In [15]:
# create yankees_df
yankees = baseball[(baseball['Team'] == 'NYY')]
# keep cols Team, League, Year, RS, RA, W, G, Playoffs
cols = ['Team', 'League', 'Year', 'RS', 'RA', 'W', 'G', 'Playoffs']
yankees_df = yankees[cols].reset_index(drop=True)
print(yankees_df)

   Team League  Year   RS   RA    W    G  Playoffs
0   NYY     AL  2012  804  668   95  162         1
1   NYY     AL  2011  867  657   97  162         1
2   NYY     AL  2010  859  693   95  162         1
3   NYY     AL  2009  915  753  103  162         1
4   NYY     AL  2008  789  727   89  162         0
5   NYY     AL  2007  968  777   94  162         1
6   NYY     AL  2006  930  767   97  162         1
7   NYY     AL  2005  886  789   95  162         1
8   NYY     AL  2004  897  808  101  162         1
9   NYY     AL  2003  877  716  101  163         1
10  NYY     AL  2002  897  697  103  161         1
11  NYY     AL  2001  804  713   95  161         1
12  NYY     AL  2000  871  814   87  161         1
13  NYY     AL  1999  900  731   98  162         1
14  NYY     AL  1998  965  656  114  162         1
15  NYY     AL  1997  891  688   96  162         1
16  NYY     AL  1996  871  787   92  162         1
17  NYY     AL  1993  821  761   88  162         0
18  NYY     AL  1992  733  746 

In [16]:
run_diffs = []

# Loop over the DataFrame and calculate each row's run differential
for row in yankees_df.itertuples():
    
    runs_scored = row.RS
    runs_allowed = row.RA

    run_diff = calc_run_diff(runs_scored, runs_allowed)
    
    run_diffs.append(run_diff)

# Append new column
yankees_df['RD'] = run_diffs
print(yankees_df)

   Team League  Year   RS   RA    W    G  Playoffs   RD
0   NYY     AL  2012  804  668   95  162         1  136
1   NYY     AL  2011  867  657   97  162         1  210
2   NYY     AL  2010  859  693   95  162         1  166
3   NYY     AL  2009  915  753  103  162         1  162
4   NYY     AL  2008  789  727   89  162         0   62
5   NYY     AL  2007  968  777   94  162         1  191
6   NYY     AL  2006  930  767   97  162         1  163
7   NYY     AL  2005  886  789   95  162         1   97
8   NYY     AL  2004  897  808  101  162         1   89
9   NYY     AL  2003  877  716  101  163         1  161
10  NYY     AL  2002  897  697  103  161         1  200
11  NYY     AL  2001  804  713   95  161         1   91
12  NYY     AL  2000  871  814   87  161         1   57
13  NYY     AL  1999  900  731   98  162         1  169
14  NYY     AL  1998  965  656  114  162         1  309
15  NYY     AL  1997  891  688   96  162         1  203
16  NYY     AL  1996  871  787   92  162        

### alternative to looping in pandas

* `.apply()` method - applies to whole dataframe
  * must specify an axis to apply (`0` for columns; `1` for rows)
* can be used with anonymous functions (`lambda` functions)
* example: 
```python

run_diffs_apply = baseball_df.apply(
    lambda row: calc_run_diff(row['RS'], row['RA']),
    axis = 1
)

baseball_df['RD'] = run_diffs_apply

```


In [17]:
# create tampa bays df
rays = baseball[(baseball['Team'] == 'TBR')]
# keep cols TYear, RS, RA, W, Playoffs
cols = ['Year', 'RS', 'RA', 'W', 'Playoffs']
# make 'Year' the index
rays_df = rays[cols].set_index('Year')

print(rays_df)


       RS   RA   W  Playoffs
Year                        
2012  697  577  90         0
2011  707  614  91         1
2010  802  649  96         1
2009  803  754  84         0
2008  774  671  97         1


In [19]:
# sum of all columns
stat_totals = rays_df.apply(sum, axis=0)
print(stat_totals)

# Gather total runs scored in all games per year
total_runs_scored = rays_df[['RS', 'RA']].apply(sum, axis=1)
print(total_runs_scored)

RS          3783
RA          3265
W            458
Playoffs       3
dtype: int64
Year
2012    1274
2011    1321
2010    1451
2009    1557
2008    1445
dtype: int64


In [20]:
def text_playoffs(num_playoffs): 
    if num_playoffs == 1:
        return 'Yes'
    else:
        return 'No' 

# Convert numeric playoffs to text by applying text_playoffs()
textual_playoffs = rays_df.apply(lambda row: text_playoffs(row['Playoffs']), axis=1)
print(textual_playoffs)

Year
2012     No
2011    Yes
2010    Yes
2009     No
2008    Yes
dtype: object


In [22]:
# create diamondbacks df
dbacks_df = baseball[(baseball['Team'] == 'ARI')]
# keep cols Team, League, Year, RS, RA, W, G, Playoffs
cols = ['Team', 'League', 'Year', 'RS', 'RA', 'W', 'G', 'Playoffs']
dbacks_df = dbacks_df[cols].reset_index(drop=True)
print(dbacks_df)

def calc_win_perc(wins, games_played):
    win_perc = wins / games_played
    return np.round(win_perc,2)

   Team League  Year   RS   RA    W    G  Playoffs
0   ARI     NL  2012  734  688   81  162         0
1   ARI     NL  2011  731  662   94  162         1
2   ARI     NL  2010  713  836   65  162         0
3   ARI     NL  2009  720  782   70  162         0
4   ARI     NL  2008  720  706   82  162         0
5   ARI     NL  2007  712  732   90  162         1
6   ARI     NL  2006  773  788   76  162         0
7   ARI     NL  2005  696  856   77  162         0
8   ARI     NL  2004  615  899   51  162         0
9   ARI     NL  2003  717  685   84  162         0
10  ARI     NL  2002  819  674   98  162         1
11  ARI     NL  2001  818  677   92  162         1
12  ARI     NL  2000  792  754   85  162         0
13  ARI     NL  1999  908  676  100  162         1
14  ARI     NL  1998  665  812   65  162         0


In [24]:
import numpy as np
# Create a win percentage Series 
win_percs = dbacks_df.apply(lambda row: calc_win_perc(row['W'], row['G']), axis=1)
print(win_percs, '\n')

# Append a new column to dbacks_df
dbacks_df['WP'] = win_percs
print(dbacks_df, '\n')

# Display dbacks_df where WP is greater than 0.50
print(dbacks_df[dbacks_df['WP'] >= 0.50])

0     0.50
1     0.58
2     0.40
3     0.43
4     0.51
5     0.56
6     0.47
7     0.48
8     0.31
9     0.52
10    0.60
11    0.57
12    0.52
13    0.62
14    0.40
dtype: float64 

   Team League  Year   RS   RA    W    G  Playoffs    WP
0   ARI     NL  2012  734  688   81  162         0  0.50
1   ARI     NL  2011  731  662   94  162         1  0.58
2   ARI     NL  2010  713  836   65  162         0  0.40
3   ARI     NL  2009  720  782   70  162         0  0.43
4   ARI     NL  2008  720  706   82  162         0  0.51
5   ARI     NL  2007  712  732   90  162         1  0.56
6   ARI     NL  2006  773  788   76  162         0  0.47
7   ARI     NL  2005  696  856   77  162         0  0.48
8   ARI     NL  2004  615  899   51  162         0  0.31
9   ARI     NL  2003  717  685   84  162         0  0.52
10  ARI     NL  2002  819  674   98  162         1  0.60
11  ARI     NL  2001  818  677   92  162         1  0.57
12  ARI     NL  2000  792  754   85  162         0  0.52
13  ARI     NL  1999

### optimal pandas

* eliminate loops 
* pandas built on numpy
* broadcasting/vectorising is available

* broadcasting - every efficient

`run_diffs_np = baseball_df['RS'].values - baseball_df['RA'].values`

In [25]:
# Use the W array and G array to calculate win percentages
win_percs_np = calc_win_perc(baseball['W'].values, baseball['G'].values)

# Append a new column to baseball_df that stores all win percentages
baseball['WP'] = win_percs_np

print(baseball.head())

  Team League  Year   RS   RA   W    OBP    SLG     BA  Playoffs  RankSeason  \
0  ARI     NL  2012  734  688  81  0.328  0.418  0.259         0         NaN   
1  ATL     NL  2012  700  600  94  0.320  0.389  0.247         1         4.0   
2  BAL     AL  2012  712  705  93  0.311  0.417  0.247         1         5.0   
3  BOS     AL  2012  734  806  69  0.315  0.415  0.260         0         NaN   
4  CHC     NL  2012  613  759  61  0.302  0.378  0.240         0         NaN   

   RankPlayoffs    G   OOBP   OSLG    WP  
0           NaN  162  0.317  0.415  0.50  
1           5.0  162  0.306  0.378  0.58  
2           4.0  162  0.315  0.403  0.57  
3           NaN  162  0.331  0.428  0.43  
4           NaN  162  0.335  0.424  0.38  


In [30]:
%%timeit
win_percs_list = []

for i in range(len(baseball)):
    row = baseball.iloc[i]

    wins = row['W']
    games_played = row['G']

    win_perc = calc_win_perc(wins, games_played)

    win_percs_list.append(win_perc)

baseball['WP'] = win_percs_list

38 ms ± 806 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [32]:
%%timeit 

win_percs_np = calc_win_perc(baseball['W'].values, baseball['G'].values)
baseball['WP'] = win_percs_np

46.9 µs ± 1.41 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [33]:
def predict_win_perc(RS, RA):
    prediction = RS ** 2 / (RS ** 2 + RA ** 2)
    return np.round(prediction, 2)

In [34]:
win_perc_preds_loop = []

# Use a loop and .itertuples() to collect each row's predicted win percentage
for row in baseball.itertuples():
    runs_scored = row.RS
    runs_allowed = row.RA
    win_perc_pred = predict_win_perc(runs_scored, runs_allowed)
    win_perc_preds_loop.append(win_perc_pred)

# Apply predict_win_perc to each row of the DataFrame
win_perc_preds_apply = baseball.apply(lambda row: predict_win_perc(row['RS'], row['RA']), axis=1)

# Calculate the win percentage predictions using NumPy arrays
win_perc_preds_np = predict_win_perc(baseball['RS'].values, baseball['RA'].values)
baseball['WP_preds'] = win_perc_preds_np
print(baseball.head())

  Team League  Year   RS   RA   W    OBP    SLG     BA  Playoffs  RankSeason  \
0  ARI     NL  2012  734  688  81  0.328  0.418  0.259         0         NaN   
1  ATL     NL  2012  700  600  94  0.320  0.389  0.247         1         4.0   
2  BAL     AL  2012  712  705  93  0.311  0.417  0.247         1         5.0   
3  BOS     AL  2012  734  806  69  0.315  0.415  0.260         0         NaN   
4  CHC     NL  2012  613  759  61  0.302  0.378  0.240         0         NaN   

   RankPlayoffs    G   OOBP   OSLG    WP  WP_preds  
0           NaN  162  0.317  0.415  0.50      0.53  
1           5.0  162  0.306  0.378  0.58      0.58  
2           4.0  162  0.315  0.403  0.57      0.50  
3           NaN  162  0.331  0.428  0.43      0.45  
4           NaN  162  0.335  0.424  0.38      0.39  


In [38]:
%%timeit
win_perc_preds_loop = []
for row in baseball.itertuples():
    runs_scored = row.RS
    runs_allowed = row.RA
    win_perc_pred = predict_win_perc(runs_scored, runs_allowed)
    win_perc_preds_loop.append(win_perc_pred)


7.03 ms ± 150 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [39]:
%%timeit
win_perc_preds_apply = baseball.apply(lambda row: predict_win_perc(row['RS'], row['RA']), axis=1)

11.7 ms ± 317 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [40]:
%%timeit
win_perc_preds_np = predict_win_perc(baseball['RS'].values, baseball['RA'].values)

13.5 µs ± 83.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
