# Dictionaries and Frequency Tables: Takeaways

## Syntax

- Creating a dictionary:

In [7]:
# First way
dictionary = {'key_1': 1, 'key_2': 2}
# Second way
dictionary = {}
dictionary['key_1'] = 1
dictionary['key_2'] = 2

- Retrieving individual dictionary values:

In [8]:
dictionary = {'key_1': 100, 'key_2': 200}
dictionary['key_1'] # Outputs 100
dictionary['key_2'] # Outputs 200

200

- Checking whether a certain value exist in the dictionary as a key:

In [9]:
dictionary = {'key_1': 100, 'key_2': 200}
'key_1' in dictionary # Outputs True
'key_5' in dictionary # Outputs False
100 in dictionary # Outputs False

False

- Updating dictionary values:

In [17]:
dictionary = {'key_1': 100, 'key_2': 200}
dictionary['key_1'] += 600 # This will change the value to 700

- The index of a dictionary value is called a key. In '4+': 4433 , the dictionary key is '4+' , and
the dictionary value is 4433 . As a whole, '4+': 4433 is a key-value pair.

- Dictionary values can be of any data type: strings, integers, floats, Booleans, lists, and even
dictionaries. Dictionary keys can be of almost any data type we've learned so far, except for lists
and dictionaries. If we use lists or dictionaries as dictionary keys, the computer raises an error.

- We can check whether a certain value exist in the dictionary as a key using an the in operator.
An in expression always returns a Boolean value.

- The number of times a unique value occurs is also called frequency. Tables that map unique
values to their frequencies are called frequency tables.

- When we iterate over a dictionary with a for loop, the looping is done by default over the
dictionary keys.

# Functions: Fundamentals: Takeaways

- Creating a function with a single parameter:

In [14]:
def square(number):
    return number**2

- Creating a function with more than one parameter:

In [15]:
def add(x, y):
    return x + y

- Reusing a function within another function's definition:

In [16]:
def add_to_square(x):
    return square(x) + 1000 # we defined square() above

- Generally, a function displays this pattern:
    - It takes in an input.
    - It does something to that input.
    - It gives back an output.
    
- In Python, we have built-in functions like sum() , max() , min() , len() , and print() ,
and functions that we create ourselves.

- Structurally, a function is composed of a header (which contains the def statement), a body,
and a return statement.

- Input variables are called parameters, and the various values that parameters take are called
arguments. In def square(number) , the number variable is a parameter. In square(number=6) ,
the value 6 is an argument that is passed to the parameter number .

- Arguments that are passed by name are called keyword arguments (the parameters give the
name). When we use multiple keyword arguments, the order we use doesn't make any practical
difference.

- Arguments that are passed by position are called positional arguments. When we use multiple
positional arguments, the order we use matters.

- Debugging more complex functions can be a bit more challenging, but we can find the bugs by
reading the traceback.

- Initiating parameters with default arguments:

In [18]:
def add_value(x, constant=3.14):
    return x + constant

- Using multiple return statements:

In [19]:
def sum_or_difference(a, b, do_sum):
    if do_sum:
        return a + b
    return a - b

- Returning multiple variables:

In [20]:
def sum_and_difference(a, b):
    a_sum = a + b
    difference = a - b
    return a_sum, difference

sum_1, diff_1 = sum_and_difference(15, 10)

- We need to avoid using the name of a built-in function to name a function or a variable because
this overwrites the built-in function.

- Each built-in function is well documented in the official Python documentation.

- Parameters and return statements are not mandatory when we create a function.

- The code inside a function definition is executed only when the function is called.
When a function is called, the variables defined inside the function definition are saved into a
temporary memory that is erased immediately after the function finishes running. The temporary
memory associated with a function is isolated from the memory associated with the main
program (the main program is the part of the program outside function definitions).

- The part of a program where a variable can be accessed is often called scope. The variables
defined in the main program are said to be in the global scope, while the variables defined inside
a function are in the local scope.

- Python searches the global scope if a variable is not available in the local scope, but the reverse
doesn't apply. Python won't search the local scope if it doesn't find a variable in the global scope.
Even if it searched the local scope, the memory associated with a function is temporary, so the
search would be pointless.

# Cleaning and Preparing Data in Python: Takeaways

In [5]:
# Replace a substring within a string:
green_ball = "red ball".replace("red", "green")

# Remove a substring:
friend_removed = "hello there friend!".replace(" friend", "")

# Remove a series of characters from a string:
bad_chars = ["'", ",", ".", "!"]
string = "We'll remove apostrophes, commas, periods, and exclamation marks!"
for char in bad_chars:
    string = string.replace(char, "")

test_data = ["1912", "1929", "1913-1923",
             "(1951)", "1994", "1934",
             "c. 1915", "1995", "c. 1912",
             "(1988)", "2002", "1957-1959",
             "c. 1955.", "c. 1970's", 
             "C. 1990-1999"]

bad_chars = ["(",")","c","C",".","s","'", " "]

def strip_characters(string):
    for char in bad_chars:
        string = string.replace(char,"")
    return string

stripped_test_data = ['1912', '1929', '1913-1923',
                      '1951', '1994', '1934',
                      '1915', '1995', '1912',
                      '1988', '2002', '1957-1959',
                      '1955', '1970', '1990-1999']

# the date above
# some are a single year, some are ranges of years.
# When you encounter data like this, you need to make decisions on how you'll proceed.

def process_date(string):
    if "-" not in string:
        return int(string)
    else:
        date_split = string.split("-")
        date_one = int(date_split[0])
        date_two = int(date_split[1])
        date = round((date_one + date_two) / 2)
        return date  

processed_test_data = []

for data in stripped_test_data:
    processed_test_data.append(process_date(data))


# Convert a string to title cases:
Hello = "hello".title()

# Check a string for the existence of a substring:
if "car" in "carpet":
    print("The substring was found.")
else:
    print("The substring was not found.")

# Split a string into a list of strings:
split_on_dash = "1980-12-08".split("-")
    
# Slice characters from a string by position:
first_five_chars = "This is a long string."[:5]
    
# Concatenate strings:
superman = "Clark" + " " + "Kent"

The substring was found.


When working with comma separated value (CSV) data in Python, it's common to have your data in a "list of lists" format, where each item of the internal lists are strings.

If you have numeric data stored as strings, sometimes you will need to remove and replace certain characters before you can convert the strings to numeric types, like **int** and **float**

Strings in Python are made from the same underlying data type as lists, which means you can index and slice specific characters from strings like you can lists.

In [6]:
# import the reader function from the csv module
from csv import reader

# use the python built-in function open()
# to open the children.csv file
opened_file = open('artworks.csv')

# use csv.reader() to parse the data from
# the opened file
read_file = reader(opened_file)

# use list() to convert the read file
# into a list of lists format
moma = list(read_file)

# remove the first row of the data, which
# contains the column names
moma = moma[1:]

for row in moma:
    nationality = row[2] # Nationality example: (American)	
    nationality = nationality.replace("(", "")
    nationality = nationality.replace(")", "")
    row[2] = nationality
    gender = row[5]
    gender = gender.replace("(", "")
    gender = gender.replace(")", "")
    row[5] = gender
    
for row in moma:
    # fix the capitalization and missing
    # values for the gender column
    gender = row[5]
    
    if not gender:
        gender = "Gender Unknown/Other"
    else:
        gender = gender.title()
        
    row[5] = gender

    # fix the capitalization and missing
    # values for the nationality column
    nationality = row[2]
    
    if not nationality: # If Nationality is None or zero/empty container
        nationality = "Nationality Unknown"
    else:
        nationality = nationality.title()
        
    row[2] = nationality
    
def clean_and_convert(date):
    # check that we don't have an empty string
    if date != "":
        # move the rest of the function inside
        # the if statement
        date = date.replace("(", "")
        date = date.replace(")", "")
        date = int(date)
    return date

for row in moma:
    begin_date = row[3] # Begin Date
    end_date = row[4] # End Date
    
    begin_date = clean_and_convert(begin_date)
    end_date = clean_and_convert(end_date)
    
    row[3] = begin_date
    row[4] = end_date
     
    
for row in moma:
    date = row[6]
    date = process_date(strip_characters(date))
    row[6] = date    
    
def process_date(string):
    if "-" not in string:
        return int(string)
    else:
        date_split = string.split("-")
        date_one = int(date_split[0])
        date_two = int(date_split[1])
        date = round((date_one + date_two) / 2)
        return date      

# Python Data Analysis Basics: Takeaways

## Syntax

- Insert values into a string in order:

In [37]:
continents = "France is in {} and China is in {}".format("Europe", "Asia")

- Insert values into a string by position:

In [38]:
squares = "{0} times {0} equals {1}".format(3,9)

- Insert values into a string by name:

In [39]:
population = "{name}'s population is {pop} million".format(name="Brazil", pop=209)

- Format specification for precision of two decimal places:

In [40]:
two_decimal_places = "I own {:.2f}% of the company".format(32.5548651132)

- Format specification for comma separator:

In [41]:
india_pop = "The approximate population of {} is {:,}".format("India",1324000000)

- Order for format specification when using precision and comma separator:

In [42]:
balance_string = "Your bank balance is {:,.2f}".format(12345.678)

- The str.format() method allows you to insert values into strings without explicitly converting
them.
- The str.format() method also accepts optional format specifications, which you can use to
format values so they are easier to read.

### Reading our MoMA Data Set

In [44]:
from csv import reader

# Read the `artworks_clean.csv` file
opened_file = open('artworks_clean.csv')
read_file = reader(opened_file)
moma = list(read_file)
moma = moma[1:]

# Convert the birthdate values
for row in moma:
    birth_date = row[3]
    if birth_date != "":
        birth_date = int(birth_date)
    row[3] = birth_date
    
# Convert the death date values
for row in moma:
    death_date = row[4]
    if death_date != "":
        death_date = int(death_date)
    row[4] = death_date

# Write your code below
for row in moma:
    date = row[6]
    if date:
        date = int(date)
        row[6] = date

### Calculating Artist Ages

In [45]:
ages = []

for row in moma:
    date = row[6]
    birth = row[3]
    '''
    The isinstance() function returns True if the specified object is of        the specified type, otherwise False.

    If the type parameter is a tuple, this function will return True if         the object is one of the types in the tuple.
    '''
    if (isinstance(birth, int)):
        ages.append(date - birth)
    else:
        ages.append(0)

final_ages = []

for age in ages:
    if age > 20:
        final_age = age
    else:
        final_age = "Unknown"
    final_ages.append(final_age)
    


### Converting Ages to Decades

In [46]:
# The final_ages variable is available
# from the previous screen

decades = []

for age in final_ages:
    if age == "Unknown":
        decade = age
    else:
        decade = str(age)
        # As a first step toward this, we'll need to remove the last digit          in every age
        decade = decade[:-1]
        decade = decade + "0s"
    decades.append(decade)

### Summarizing the Decade Data

In [47]:
# The decades variable is available
# from the previous screen
decade_frequency = {}

for d in decades:
    if d not in decade_frequency:
        decade_frequency[d] = 1
    else:
        decade_frequency[d] += 1

###  Inserting Variables Into Strings

In [48]:
artist = "Pablo Picasso"
birth_year = 1881
template = "{name}'s birth year is {year}"
output = template.format(name=artist, year=birth_year)
print(output)

Pablo Picasso's birth year is 1881


### Creating an Artist Frequency Table

In [49]:
artist_freq = {}

for row in moma:
    artist = row[1]
    if artist not in artist_freq:
        artist_freq[artist] = 1
    else:
        artist_freq[artist] += 1

### Creating an Artist Summary Function

In [50]:
def artist_summary(artist):
    num_artworks = artist_freq[artist]
    template = "There are {num} artworks by {name} in the data set"
    output = template.format(name=artist, num=num_artworks)
    print(output)

artist_summary("Henri Matisse")

There are 129 artworks by Henri Matisse in the data set


### Formatting Numbers Inside Strings

In [51]:
pop_millions = [
    ["China", 1379.302771],
    ["India", 1281.935991],
    ["USA",  326.625791],
    ["Indonesia",  260.580739],
    ["Brazil",  207.353391],
]
template = "The population of {} is {:,.2f} million"

for country in pop_millions:
    name = country[0]
    pop = country[1]
    output = template.format(name, pop)
    print(output)

The population of China is 1,379.30 million
The population of India is 1,281.94 million
The population of USA is 326.63 million
The population of Indonesia is 260.58 million
The population of Brazil is 207.35 million


### Summarizing Artwork Gender Data

In [52]:
# gender frequency table
gender_freq = {}

for row in moma:
    gender = row[5]
    if gender not in gender_freq:
        gender_freq[gender] = 1
    else:
        gender_freq[gender] += 1

for gender, num in gender_freq.items():
    template = "There are {n:,} artworks by {g} artists"
    print(template.format(g=gender, n=num))

There are 2,443 artworks by Female artists
There are 13,491 artworks by Male artists
There are 791 artworks by Gender Unknown/Other artists


# Object-Oriented Python: Takeaways
## Syntax

- Define an empty class:

In [53]:
class MyClass:
    pass

- Instantiate an object of a class:

In [57]:
class MyClass:
    pass

mc_1 = MyClass()
type(mc_1)

__main__.MyClass

- Define an init function in a class to assign an attribute at instantiation:

In [59]:
class MyClass:
    def __init__(self, param_1):
        self.attribute_1 = param_1

mc_2 = MyClass("arg_1")
type(mc_2)

__main__.MyClass

- Define a method inside a class and call it on an instantiated object:

In [66]:
class MyClass:
    def __init__(self, param_1):
        self.attribute_1 = param_1
    def add_20(self):
        self.attribute_1 += 20

mc_3 = MyClass(10) # mc_3.attribute is 10
mc_3.add_20() # mc_3.attribute is 30

print("Type of mc_3: ", type(mc_3))
print("Type of mc_3.attribute_1: ", type(mc_3.attribute_1))
print("mc_3.attribute_1 value: ", mc_3.attribute_1)

Type of mc_3:  <class '__main__.MyClass'>
Type of mc_3.attribute_1:  <class 'int'>
mc_3.attribute_1 value:  30


- In Object-Oriented Programming, the fundamental building blocks are objects.
     - It differs from Procedural programming, where sequential steps are executed.
- An object is an entity that stores data.
- A class describes an object's type. It defines:
    - What data is stored in the object, known as attributes.
    - What actions the object can do, known as methods.
- An attribute is a variable that belongs to an instance of a class.
- A method is a function that belongs to an instance of a class.

- Attributes and methods are accessed using dot notation. Attributes do not use parentheses,
whereas methods do.

- An instance describes a specific example of a class. For instance, in the code x = 3 , x is an
instance of the type int .
    - When an object is created, it is known as instantiation.

- A class definition is code that defines how a class behaves, including all methods and
attributes.
- The init method is a special method that runs at the moment an object is instantiated.
    - The init method ( __init__() ) is one of a number of special methods that Python defines.
- All methods must include self , representing the object instance, as their first parameter.
- It is convention to start the name of any attributes or methods that aren't intended for external
use with an underscore.

- Understanding 'self'

In [67]:
class MyClass:
    def first_method():
        print("This is my first method")

my_instance = MyClass()

In [68]:
my_instance.first_method()

TypeError: first_method() takes 0 positional arguments but 1 was given

This error is a bit confusing. It says that one argument was given to `first_method()`, but when we called the method we didn't provide any arguments. It seems like there is a "phantom" argument being inserted somewhere. To understand what's happening, let's look at what happens behind the scenes when we call a method. We'll start by looking at our `my_instance` object containing a single method:

When we call the `first_method()` method belonging to the `my_instance` object, Python interprets that syntax and adds in an argument representing the instance we're calling the method on:

> `instance.first_method()` **equals to** `NewList.first_method(instance)`

In [71]:
class MyClass:
    
    def first_method(self):
        return "This is my first method"

my_instance = MyClass()
result = my_instance.first_method()

print(result)

This is my first method


- Creating a Method That Accepts an Argument

In [73]:
class MyClass:
    
    def first_method(self):
        return "This is my first method"
    
    # Add method here
    def return_list(self, input_list):
        return input_list

my_instance = MyClass()
result = my_instance.return_list([1, 2, 3])
result

[1, 2, 3]

- Attributes and the Init Method

In [74]:
class MyList:
    def __init__(self, initial_data):
        self.data = initial_data

my_list = MyList([1, 2, 3, 4, 5])

print(my_list.data)

[1, 2, 3, 4, 5]


- Creating an Append Method

In [75]:
class MyList:

    def __init__(self, initial_data):
        self.data = initial_data
        
    # Add method here
    def append(self, new_item):
        self.data.append(new_item)
    
my_list = MyList([1, 2, 3, 4, 5])
print(my_list.data)

my_list.append(6)
print(my_list.data)

[1, 2, 3, 4, 5]
[1, 2, 3, 4, 5, 6]


- Creating and Updating an Attribute

In [76]:
class MyList:

    def __init__(self, initial_data):
        self.data = initial_data
        # Calculate the initial length
        '''
        self.length = 0
        for item in self.data:
            self.length += 1
        '''
        self.length = len(self.data)

    def append(self, new_item):
        self.data = self.data + [new_item]
        # Update the length here
        self.length += 1
        
my_list = MyList([1, 1, 2, 3 ,5])
print(my_list.length)
my_list.append(8)
print(my_list.length)

5
6


# Working with Dates and Times in Python: Takeaways

### IMPORTING MODULES AND DEFINITIONS

In [84]:
# Importing a whole module:
import csv
# csv.reader()

# Importing a whole module with an alias:
import csv as c
# c.reader()

# Importing a single definition:
from csv import reader
# reader()

# Importing multiple definitions:
from csv import reader, writer
# reader()
# writer()

# Importing all definitions:
from csv import *

### WORKING WITH THE DATETIME MODULE

In [86]:
# All examples below presume the following import code:
import datetime as dt

# Creating datetime.datetime object given a month, year, and day:
eg_1 = dt.datetime(1985, 3, 13)

# Creating a datetime.datetime object from a string:
eg_2 = dt.datetime.strptime("24/12/1984", "%d/%m/%Y")

# Converting a datetime.datetime object to a string:
dt_object = dt.datetime(1984, 12, 24)
dt_string = dt_object.strftime("%d/%m/%Y")

# Instantiating a datetime.time object:
eg_3 = dt.time(hour=0, minute=0, second=0, microsecond=0)

# Retrieving a part of a date stored in the datetime.datetime object:
eg_1.day

# Creating a datetime.time object from a datetime.datetime object:
d2_dt = dt.datetime(1946, 9, 10)
d2 = d2_dt.time()

# Creating a datetime.time object from a string:
d3_str = "17 February 1963"
d3_dt = dt.datetime.strptime(d3_str, "%d %B %Y")
d3 = d3_dt.time()

# Instantiating a datetime.timedelta object:
eg_4 = dt.timedelta(weeks=3)

# Adding a time period to a datetime.datetime object:
d1 = dt.date(1963, 2, 26)
d1_plus_1wk = d1 + dt.timedelta(weeks=1)

- The datetime module contains the following classes:
    - datetime.datetime : For working with date and time data
    - datetime.time : For working with time data only
    - datetime.timedelta : For representing time periods
- Time objects behave similarly to datetime objects for the following reasons:
    - They have attributes like time.hour and time.second that you can use to access
individual time components.
    - They have a time.strftime() method, which you can use to create a formatted string
representation of the object.

- The timedelta type represents a period of time, e.g. 30 minutes or two days.


#### [`strftime()` and `strptime()` Format Codes](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes)

##### Operations between timedelta, datetime, and time objects (datetime can be substituted with time):
- Operation
    - `datetime - datetime` Calculate the time between two specific dates/times 
        - *Resultant Type*: timedelta
    - `datetime - timedelta` Subtract a time period from a date or time. 
        - *Resultant Type*: datetime
    - `datetime + timedelta` Add a time period to a date or time. 
        - *Resultant Typ*e: datetime
    - `timedelta + timedelta` Add two periods of time together 
        - *Resultant Type*: timedelta
    - `timedelta - timedelta` Calculate the difference between two time periods. 
        - *Resultant Type*: timedelta