## Python Data Analysis Basics

We will learn how to:

- Calculate how old the artist was when they created their artwork.
- Analyze and interpret the distribution of artist ages.
- Create functions which summarize our data.
- Print summaries in an easy-to-read-way.

DQ has provided a dataset called '_artworks_clean.csv_' and we will analyze that. Note that we won't run the code since we don't have that dataset available. (You have to pay to get it! :/ )

__Update:__ I have successfully fixed the 'Date' column using the Python library 'regex'. 

In [None]:
from csv import reader

# Read the `artworks_clean.csv` file
opened_file = open('data/artworks_clean.csv')
read_file = reader(opened_file)
moma = list(read_file)
moma_header = moma[0]
moma_data = moma[1:]

# Convert the birthdate values
for row in moma_data:
    birth_date = row[5]  # The Index is 5 in our dataset
    if birth_date != "":
        birth_date = int(birth_date)
    row[5] = birth_date
    
# Convert the death date values
for row in moma:
    death_date = row[6]  # The Index is 6 in our dataset
    if death_date != "":
        death_date = int(death_date)
    row[6] = death_date

In [None]:
# Convert the date column values
for row in moma_data:
    date = row[8]   # The Index is 8 in our dataset
    if date != "":
        date= int(date)
    row[8] = date

We're going to work on calculating the ages at which artists created their pieces of art. We need to subtract the artist's birth year (BeginDate) from the year in which their artwork was created (Date).

While every row has a value for Date, there are some that are missing values for BeginDate. When we cleaned BeginDate, we encountered some missing values and left them as empty strings (""). We'll use a value of 0 for these cases, which we'll replace with something more meaningful later on.

There are a handful of cases where the artist's age (according to our data set) is very low, including some where the age is negative. We could investigate these specific cases one by one, but since we're looking for a summary, we'll take care of these by categorizing artists younger than 20 as "Unknown" also. This has the handy effect of also categorizing the artists without birth years as "Unknown".

| Year Artwork Created (date) | Birth Year (birth) | age | final_age |
| --------------------------- | ------------------ | --- | --------- |
| 1968 | 1898 | 70 | 70        |
| 1931 |  ""  | 0  | "Unknown" |
| 1972 | 1976 | -4 | "Unknown" |

In [None]:
ages = []

for row in moma_data:
    date = row[8]
    birth = row[5]
    if isinstance(birth, int):
        age = date - birth
    elif