Sample datasets from The Museum of Modern Art (MoMA) [via GitHub](https://github.com/MuseumofModernArt/collection). 

The files on MOMA are updated all the time and are now HUGE so I have provided more workable versions on Canvas for you to download in the artists_artworks.zip file: 
https://utexas.instructure.com/courses/1216881/files/folder/Week_8

Unzip and download these files to your /sharedfolder/ on your desktop.

If you want to revisit this assignment with the actual MOMA files, you can get them here:  
- [Artists.csv](https://media.githubusercontent.com/media/MuseumofModernArt/collection/master/Artists.csv)
- [Artworks.csv](https://media.githubusercontent.com/media/MuseumofModernArt/collection/master/Artworks.csv)

###  Artists Data

In [3]:
import csv

artist_csv_path = "/sharedfolder/Artists.csv"
artist_table = []

with open(artist_csv_path) as fi:
    csv_input = csv.reader(fi)
    for row in csv_input:
        artist_table.append(row)

artist_header = artist_table[0]
artist_table.remove(artist_table[0])

artist_header

['\ufeffConstituentID',
 'DisplayName',
 'ArtistBio',
 'Nationality',
 'Gender',
 'BeginDate',
 'EndDate',
 'Wiki QID',
 'ULAN']

Now, check the length of the table, then enter an index value in brackets to look at an entry.

In [4]:
print(len(artist_table))

artist_table[6310]

15228


['7019',
 'Rudolph Von Ripper',
 'American, 1905–1960',
 'American',
 'Male',
 '1905',
 '1960',
 '',
 '']

We’ve just copied all the data from a CSV-formatted spreadsheet and turned it into a format Python can easily work with: a list of lists of strings. Let’s walk through the above a step at a time, this time loading MoMA’s artwork metadata.

We began by importing the `csv` module, Python’s built-in CSV input/output tool. `import csv`

### Artworks Data

Next we assign our pathname to the `artwork_path` variable and initialize an empty list called `artwork_table`.

This will become our *list of lists*, Python’s version of a table.

In [5]:
artwork_csv_path = "/sharedfolder/Artworks.csv"
artwork_table = []

Then we create a file stream object `fi` that points to our spreadsheet, We pass our file object to `csv`’s constructor function and assign the new reader object to `csv_file`. Finally, using a for loop, we iterate through our csv object and add each row (represented by a list) to the master list `artwork_table`.

In [6]:
with open(artwork_csv_path) as fi:
    csv_file = csv.reader(fi)
    for row in csv_file:
        artwork_table.append(row)

Because this table uses column labels in the first row, we’ll save those labels to the variable `header` and remove it from the table.

In [7]:
artwork_header = artwork_table[0]
artwork_table.remove(artwork_table[0])

Finally, let’s look at our list of column titles …

In [8]:
artwork_header

['\ufeffTitle',
 'Artist',
 'ConstituentID',
 'ArtistBio',
 'Nationality',
 'BeginDate',
 'EndDate',
 'Gender',
 'Date',
 'Medium',
 'Dimensions',
 'CreditLine',
 'AccessionNumber',
 'Classification',
 'Department',
 'DateAcquired',
 'Cataloged',
 'ObjectID',
 'URL',
 'ThumbnailURL',
 'Circumference (cm)',
 'Depth (cm)',
 'Diameter (cm)',
 'Height (cm)',
 'Length (cm)',
 'Weight (kg)',
 'Width (cm)',
 'Seat Height (cm)',
 'Duration (sec.)']

… as well as a row in our table.

In [9]:
artwork_table[60946]

['Youth',
 'Louise Bourgeois',
 '710',
 '(American, born France. 1911–2010)',
 '(American)',
 '(1911)',
 '(2010)',
 '(Female)',
 '1941-1944',
 'Etching, drypoint, and aquatint',
 'plate: 5 1/16 x 7 9/16" (12.9 x 19.2 cm); sheet: 9 1/16 x 12 9/16" (23 x 31.9 cm)',
 'Gift of the artist',
 '237.1992.10',
 'Print',
 'Prints & Illustrated Books',
 '1992-05-20',
 'Y',
 '64962',
 'http://www.moma.org/collection/works/64962',
 'http://www.moma.org/media/W1siZiIsIjE2NjUxNyJdLFsicCIsImNvbnZlcnQiLCItcmVzaXplIDMwMHgzMDBcdTAwM2UiXV0.jpg?sha=8697b75874b02a22',
 '',
 '',
 '',
 '12.9',
 '',
 '',
 '19.2',
 '',
 '']

 **Tip:** Python will ignore any text following the “#” character on a line, which we can use to add explanatory comments within our code. Here are a couple lines from the snippet above followed by example notes.


#### Quick Assignment 1
Write a piece of code that prints each column label in `artist_header` and `artwork_header` next to its index in the list, beginning from zero as usual. You may want to keep this reference handy for the next few exercises.

In [11]:
print('Artists\n')

for i in range(len(artist_header)):
    print(str(i) + ' ' + artist_header[i])

print('\nArtworks\n')

for i in range(len(artwork_header)):
    print(str(i) + ' ' + artwork_header[i])

Artists

0 ﻿ConstituentID
1 DisplayName
2 ArtistBio
3 Nationality
4 Gender
5 BeginDate
6 EndDate
7 Wiki QID
8 ULAN

Artworks

0 ﻿Title
1 Artist
2 ConstituentID
3 ArtistBio
4 Nationality
5 BeginDate
6 EndDate
7 Gender
8 Date
9 Medium
10 Dimensions
11 CreditLine
12 AccessionNumber
13 Classification
14 Department
15 DateAcquired
16 Cataloged
17 ObjectID
18 URL
19 ThumbnailURL
20 Circumference (cm)
21 Depth (cm)
22 Diameter (cm)
23 Height (cm)
24 Length (cm)
25 Weight (kg)
26 Width (cm)
27 Seat Height (cm)
28 Duration (sec.)


#### Quick Assignment 2
Write a piece of code that creates a new table (i.e., list of lists) containing only artists born in the 1880s.


In [12]:
born_1880s = []

for row in artist_table:
    if 1880 <= int(row[5]) <= 1889:
        born_1880s.append(row)

#### Average Artist Age
Now that we’ve defined a meaningful subset of our data, let’s see what we can do with it. For instance, what was the mean life span of artists born in the 1880s (who happen to be included in MoMA's collections)?

In [13]:
lifespans_1880s = []

for row in born_1880s:
    lifespans_1880s.append(int(row[6]) - int(row[5]))

lifespans_1880s

[80,
 55,
 90,
 65,
 82,
 88,
 76,
 77,
 78,
 66,
 80,
 89,
 86,
 56,
 71,
 66,
 78,
 65,
 72,
 66,
 66,
 36,
 77,
 86,
 43,
 52,
 86,
 73,
 89,
 88,
 76,
 79,
 78,
 34,
 50,
 50,
 87,
 71,
 67,
 60,
 52,
 57,
 81,
 84,
 85,
 55,
 75,
 74,
 83,
 85,
 68,
 58,
 70,
 85,
 58,
 71,
 98,
 87,
 62,
 90,
 84,
 74,
 45,
 103,
 84,
 86,
 78,
 75,
 84,
 93,
 64,
 85,
 85,
 98,
 65,
 63,
 63,
 56,
 94,
 52,
 74,
 71,
 55,
 41,
 75,
 66,
 81,
 90,
 62,
 85,
 80,
 79,
 67,
 60,
 83,
 49,
 81,
 77,
 73,
 74,
 77,
 78,
 83,
 82,
 71,
 80,
 84,
 71,
 51,
 53,
 62,
 59,
 -1881,
 60,
 58,
 98,
 74,
 76,
 72,
 59,
 81,
 -1880,
 87,
 91,
 57,
 80,
 93,
 47,
 40,
 94,
 86,
 59,
 70,
 46,
 77,
 75,
 74,
 85,
 64,
 87,
 70,
 87,
 78,
 85,
 89,
 86,
 85,
 96,
 79,
 77,
 76,
 79,
 56,
 73,
 73,
 76,
 83,
 67,
 71,
 58,
 66,
 80,
 95,
 72,
 89,
 47,
 58,
 87,
 66,
 74,
 94,
 78,
 76,
 90,
 83,
 84,
 94,
 64,
 40,
 56,
 53,
 83,
 73,
 69,
 78,
 65,
 80,
 38,
 81,
 75,
 -1882,
 77,
 80,
 89,
 89,
 27,
 72,
 83,


If you scroll through your list of lifespans, you’ll see occasional negative numbers (e.g., “-1887”). Since missing values are represented by “0,” if no death date is listed we’ll end up subtracting an artist’s birth year from zero. Let’s amend our code to leave out these rows.


In [14]:
lifespans_1880s = []

for row in born_1880s:
    age = int(row[6])-int(row[5])
    if age > 0:
        lifespans_1880s.append(age)

lifespans_1880s

[80,
 55,
 90,
 65,
 82,
 88,
 76,
 77,
 78,
 66,
 80,
 89,
 86,
 56,
 71,
 66,
 78,
 65,
 72,
 66,
 66,
 36,
 77,
 86,
 43,
 52,
 86,
 73,
 89,
 88,
 76,
 79,
 78,
 34,
 50,
 50,
 87,
 71,
 67,
 60,
 52,
 57,
 81,
 84,
 85,
 55,
 75,
 74,
 83,
 85,
 68,
 58,
 70,
 85,
 58,
 71,
 98,
 87,
 62,
 90,
 84,
 74,
 45,
 103,
 84,
 86,
 78,
 75,
 84,
 93,
 64,
 85,
 85,
 98,
 65,
 63,
 63,
 56,
 94,
 52,
 74,
 71,
 55,
 41,
 75,
 66,
 81,
 90,
 62,
 85,
 80,
 79,
 67,
 60,
 83,
 49,
 81,
 77,
 73,
 74,
 77,
 78,
 83,
 82,
 71,
 80,
 84,
 71,
 51,
 53,
 62,
 59,
 60,
 58,
 98,
 74,
 76,
 72,
 59,
 81,
 87,
 91,
 57,
 80,
 93,
 47,
 40,
 94,
 86,
 59,
 70,
 46,
 77,
 75,
 74,
 85,
 64,
 87,
 70,
 87,
 78,
 85,
 89,
 86,
 85,
 96,
 79,
 77,
 76,
 79,
 56,
 73,
 73,
 76,
 83,
 67,
 71,
 58,
 66,
 80,
 95,
 72,
 89,
 47,
 58,
 87,
 66,
 74,
 94,
 78,
 76,
 90,
 83,
 84,
 94,
 64,
 40,
 56,
 53,
 83,
 73,
 69,
 78,
 65,
 80,
 38,
 81,
 75,
 77,
 80,
 89,
 89,
 27,
 72,
 83,
 73,
 36,
 61,
 57,
 92,

Now that we have a list of valid integers, all we need to do is calculate the mean. Below we divide the sum of the list (which we cast as a float) by its length to get 72.65 years.

In [15]:
float(sum(lifespans_1880s)) / len(lifespans_1880s)

72.8496993987976

That format is a bit verbose for a simple task like this, so to make life easier we’ll use the Python package `NumPy`.


In [16]:
import numpy
numpy.mean(lifespans_1880s)

72.849699398797597

**Tip:** The code above imports the entire `numpy` package. Python also lets us import packages’ individual functions to the current environment, which can make code more compact.

In [17]:
from numpy import mean
mean(lifespans_1880s)

72.849699398797597

A common convention is to rename `numpy` to `np` at the import step.

In [18]:
import numpy as np
np.mean(lifespans_1880s)

72.849699398797597

This guide will use to `numpy.mean()` for the sake of clarity, but feel free to set up your environment however you like.


#### Quick Assignment 3
Write a piece of code that creates a new table containing all artworks that include the term “Fluxus” in any metadata field.


In [19]:
fluxus_table = []

for row in artwork_table:
    for cell in row:
        if 'fluxus' in cell.lower():
            if row not in fluxus_table:
                fluxus_table.append(row)

#### Fluxus Metadata Continued
Now let’s make a master list of entries under “medium” in our Fluxus metadata set.

In [20]:
medium_list = []

for row in fluxus_table:
    medium_list.append(row[9])

len(medium_list)

4861

Let’s look at 10 random samples from the collection, first importing the `random` package.

In [21]:
import random

random.sample(medium_list, 10)

['Offset postcard',
 'Gelatin silver print',
 '(CONFIRM)',
 'Ink and color chart on paper',
 'Typewriting and ink on paper',
 'Typewriting on cardstock with stamps and postal cancellation',
 'Ink on paper',
 '16mm film (black and white, silent)',
 'Super- 8 cartridge',
 'File cabinet drawer, containing objects in various media']

Let’s see what terms appear most frequently in our list of media.

In [22]:
from collections import Counter

c = Counter(medium_list)
c.most_common(10)

[('', 1151),
 ('Gelatin silver print', 477),
 ('(CONFIRM)', 190),
 ('Ink on paper', 134),
 ('Offset', 120),
 ('Offset lithograph', 114),
 ('Offset card', 81),
 ('Letterpress', 75),
 ('Mixed media', 62),
 ('Photocopy', 54)]

Note that 1151 artworks are missing an entry for “medium,” with the term “(CONFIRM)” appearing 190 times.

#### Quick Assignment 4
Returning to our original MoMA metadata table, write a piece of code that extracts only works created in the 1960s (or another decade of your choosing). Since the date field in MoMA’s metadata doesn’t follow a strictly defined numerical format, you’ll have to think about how to interpret values like “1963,” “1963-5“, “c. 1963,” “c. 1960s,” etc.

In [28]:
art_1960s = []

for row in artwork_table:
    if '196' in row[8]:
        art_1960s.append(row)
        
art_1960s

[['Memorial to the Six Million Jewish Martyrs, project, New York City, New York, Perspective of central pier',
  'Louis I. Kahn',
  '2964',
  '(American, born Estonia. 1901–1974)',
  '(American)',
  '(1901)',
  '(1974)',
  '(Male)',
  '1968',
  'Charcoal and graphite on tracing paper',
  '44 1/2 x 66" (113 x 167.6 cm)',
  'Purchase',
  '3.1997',
  'Architecture',
  'Architecture & Design',
  '1997-01-15',
  'Y',
  '32',
  'http://www.moma.org/collection/works/32',
  'http://www.moma.org/media/W1siZiIsIjE3MyJdLFsicCIsImNvbnZlcnQiLCItcmVzaXplIDMwMHgzMDBcdTAwM2UiXV0.jpg?sha=84cc6597561b2de4',
  '',
  '',
  '',
  '113.0',
  '',
  '',
  '167.6',
  '',
  ''],
 ['Yale University, Art and Architecture Building, New Haven, Connecticut, Elevation',
  'Paul Rudolph',
  '5076',
  '(American, 1918–1997)',
  '(American)',
  '(1918)',
  '(1997)',
  '(Male)',
  '1958–1964',
  'Graphite and colored pencil on paper',
  '27 1/4 x 34 1/4" (69.2 x 87 cm)',
  'Gift of the architect',
  '98.1989',
  'Archite

#### Sorting a Table by Column

We can sort a table based on the values in a given column with the `sorted` function and and the `itemgetter` tool, which we use to specify the column we’re sorting by. The following sorts the table `art_1960s` by artist name (i.e., index `1` in each row).

In [29]:
from operator import itemgetter
art_1960s_sorted = sorted(art_1960s, key = itemgetter(1))

Since each row is so long, let’s just look at our sorted set of authors. The following notation returns a list of  each row’s “Artist” cell, located at index `1`.


In [30]:
[row[1] for row in art_1960s_sorted]

['',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',


Let's look at the most common nationalities in our table of 1960s artworks. Here we’re once again using the `Counter` constructor from the `collections` package.

In [31]:
c = Counter([row[4] for row in art_1960s_sorted])
c.most_common(20)

[('(American)', 10783),
 ('(French)', 1414),
 ('(German)', 964),
 ('(British)', 816),
 ('(Italian)', 767),
 ('(Japanese)', 732),
 ('', 418),
 ('(Spanish)', 321),
 ('()', 264),
 ('(Argentine)', 258),
 ('(Swiss)', 251),
 ('(Belgian)', 218),
 ('(Venezuelan)', 200),
 ('(Chilean)', 195),
 ('(American) (American)', 187),
 ('(Brazilian)', 185),
 ('(Dutch)', 181),
 ('(Mexican)', 158),
 ('(Russian)', 155),
 ('(Czech)', 148)]

It’s impossible to memorize the details of every specialized tool available in Python, so you’ll probably end up repeatedly looking up processes like these.


#### Writing CSVs
Now that we’ve filtered and sorted our metadata, let’s export it to a new CSV file called `art_1960s.csv`.

In [32]:
outpath = "/sharedfolder/art_1960s.csv"
with open(outpath, 'w') as fo:
    csv_writer = csv.writer(fo)
    csv_writer.writerow(artwork_header)
    csv_writer.writerows(art_1960s_sorted)

Note that we call use `writerow` function first to write the header row, then `writerows` to write the actual data.


Find the new file in `sharedfolder` and open it in Excel or LibreOffice. Take a few moments to explore the collection.

#### The Dictionary Data Type

So far, when we want to access the “Artist” field in MoMA’s metadata, we’ve been referring to its position in a given row.


In [33]:
row = art_1960s_sorted[7700]
row[1]

'Günter Brus'

This system is straightforward and well-suited for many jobs, but for large, complex projects it can be difficult to keep track of all those index numbers. Instead, we can use a dictionary to reference metadata fields by name rather than list index.

Just like we can refer to a item in a list using brackets to enclose its position in the list, a dictionary, or dict, uses any chosen string or number to identify a value in a collection. This data structure is known as a key-value pair. Here’s the simplest way to create a new dictionary.

In [34]:
artist_meta = {}
artist_meta['ConstituentID'] = 248
artist_meta['DisplayName'] = 'Richard Avedon'
artist_meta['ArtistBio'] = 'American, 1923–2004'
artist_meta['Nationality'] = 'American'
artist_meta['Gender'] = 'Male'
artist_meta['BeginDate'] = 1923
artist_meta['EndDate'] = 2004
artist_meta['Wiki QID'] = 'Q305497'
artist_meta['ULAN'] = '500013773'

The following is a more compact format for the same key-value assignment.

In [35]:
artist_meta = {'ConstituentID': 248, 'DisplayName': 'Richard Avedon', 'Gender': 'Male', 'BeginDate': 1923, 'EndDate': 2004, 'ULAN': '500013773', 'Wiki QID': 'Q305497', 'ArtistBio': 'American, 1923–2004', 'Nationality': 'American'}

To access a value, enter its key between brackets like so.

In [36]:
artist_meta['DisplayName']

'Richard Avedon'

And note that you can iterate over a dict to view and/or use its keys.

In [37]:
for key in artist_meta:
    print(key + " - " + str(artist_meta[key]))

ArtistBio - American, 1923–2004
Nationality - American
ULAN - 500013773
BeginDate - 1923
DisplayName - Richard Avedon
Gender - Male
EndDate - 2004
ConstituentID - 248
Wiki QID - Q305497


Next, let’s create a dict for each artist MoMA’s artist metadata. Here’s a snippet (repeated from above) that loads `Artists.csv` as a list of lists called `artist_table`.

In [38]:
import csv

artist_csv_path = "/sharedfolder/Artists.csv"
artist_table = []

with open(artist_csv_path) as fi:
    csv_input = csv.reader(fi)
    for row in csv_input:
        artist_table.append(row)

artist_header = artist_table[0]
artist_table.remove(artist_table[0])

artist_header

['\ufeffConstituentID',
 'DisplayName',
 'ArtistBio',
 'Nationality',
 'Gender',
 'BeginDate',
 'EndDate',
 'Wiki QID',
 'ULAN']

Now we’ll use a for loop to iterate through `artist_table`, converting each list of cells to key-value format.

In [39]:
artist_dicts = []

for row in artist_table:
    artist_meta = {}
    artist_meta['ConstituentID'] = row[0]
    artist_meta['DisplayName'] = row[1]
    artist_meta['ArtistBio'] = row[2]
    artist_meta['Nationality'] = row[3]
    artist_meta['Gender'] = row[4]
    artist_meta['BeginDate'] = int(row[5])
    artist_meta['EndDate'] = int(row[6])
    artist_meta['Wiki QID'] = row[7]
    artist_meta['ULAN'] = row[8]
    artist_dicts.append(artist_meta)

The list `artist_dicts` should now contain records for about 15,000 artists.

In [40]:
len(artist_dicts)

15228

Specifying an index in brackets will return a dict object.

In [41]:
artist_dicts[12007]

{'ArtistBio': 'Swedish, born 1979',
 'BeginDate': 1979,
 'ConstituentID': '35986',
 'DisplayName': 'Klara Liden',
 'EndDate': 0,
 'Gender': 'Female',
 'Nationality': 'Swedish',
 'ULAN': '',
 'Wiki QID': 'Q513343'}

And we can use one of our standard key names to get a particular value.

In [42]:
artist_dicts[12007]['DisplayName']

'Klara Liden'

If we want to create a list of artist names, birth years, etc., we can thus iterate through the `artists_dicts` list and specify the field we want by name.


#### Working with JSON
JSON data is a representation of key-value pairs, very much like a dictionary in Python. For the following example we’ll download a JSON version of the artwork metadata we’ve been working with.

In [43]:
import json
from urllib.request import urlopen

url = "https://media.githubusercontent.com/media/MuseumofModernArt/collection/master/Artworks.json?raw=true"
json_string = urlopen(url).read().decode('utf8')
json_data = json.loads(json_string)

To view JSON data (as well as dictionaries and just about any other data format), Python offers a “pretty printer” module. Here we are viewing the first 200 artworks in the metadata set.

There are also numerous online tools for prettifying JSON data, such as [these](http://jsonviewer.stack.hu/) [two](http://json.parser.online.fr/beta/).


In [44]:
from pprint import pprint

pprint(json_data[:200])

[{'AccessionNumber': '885.1996',
  'Artist': ['Otto Wagner'],
  'ArtistBio': ['Austrian, 1841–1918'],
  'BeginDate': [1841],
  'Cataloged': 'Y',
  'Classification': 'Architecture',
  'ConstituentID': [6210],
  'CreditLine': 'Fractional and promised gift of Jo Carole and Ronald S. '
                'Lauder',
  'Date': '1896',
  'DateAcquired': '1996-04-09',
  'Department': 'Architecture & Design',
  'Dimensions': '19 1/8 x 66 1/2" (48.6 x 168.9 cm)',
  'EndDate': [1918],
  'Gender': ['Male'],
  'Height (cm)': 48.6,
  'Medium': 'Ink and cut-and-pasted painted pages on paper',
  'Nationality': ['Austrian'],
  'ObjectID': 2,
  'ThumbnailURL': 'http://www.moma.org/media/W1siZiIsIjU5NDA1Il0sWyJwIiwiY29udmVydCIsIi1yZXNpemUgMzAweDMwMFx1MDAzZSJdXQ.jpg?sha=137b8455b1ec6167',
  'Title': 'Ferdinandsbrücke Project, Vienna, Austria, Elevation, preliminary '
           'version',
  'URL': 'http://www.moma.org/collection/works/2',
  'Width (cm)': 168.9},
 {'AccessionNumber': '1.1995',
  'Artist': ['Ch

 {'AccessionNumber': '7.1995.4',
  'Artist': ['Bernard Tschumi'],
  'ArtistBio': ['French and Swiss, born Switzerland 1944'],
  'BeginDate': [1944],
  'Cataloged': 'Y',
  'Classification': 'Architecture',
  'ConstituentID': [7056],
  'CreditLine': 'Purchase and partial gift of the architect in honor of Lily '
                'Auchincloss',
  'Date': '1979',
  'DateAcquired': '1995-01-17',
  'Department': 'Architecture & Design',
  'Dimensions': '48 x 24" (121.9 x 61 cm)',
  'EndDate': [0],
  'Gender': ['Male'],
  'Height (cm)': 121.9,
  'Medium': 'Ink on tracing paper',
  'Nationality': [],
  'ObjectID': 45,
  'ThumbnailURL': 'http://www.moma.org/media/W1siZiIsIjI5NyJdLFsicCIsImNvbnZlcnQiLCItcmVzaXplIDMwMHgzMDBcdTAwM2UiXV0.jpg?sha=9ea63502ed5fc43b',
  'Title': 'The Manhattan Transcripts Project, New York, New York, Episode 3: '
           'The Tower (The Fall)',
  'URL': 'http://www.moma.org/collection/works/45',
  'Width (cm)': 61.0},
 {'AccessionNumber': '7.1995.5',
  'Artist': ['Ber

  'Width (cm)': 94.7739},
 {'AccessionNumber': '22.1980',
  'Artist': ['Roger C. Ferri'],
  'ArtistBio': ['American, 1949–1991'],
  'BeginDate': [1949],
  'Cataloged': 'Y',
  'Classification': 'Architecture',
  'ConstituentID': [1863],
  'CreditLine': 'National Endowment for the Arts Project Funds',
  'Date': '1979',
  'DateAcquired': '1980-01-08',
  'Department': 'Architecture & Design',
  'Dimensions': '55 3/4 x 43 5/8" (141.6 x 110.8 cm)',
  'EndDate': [1991],
  'Gender': ['Male'],
  'Height (cm)': 141.6053,
  'Medium': 'Ink and pastel on paper',
  'Nationality': ['American'],
  'ObjectID': 87,
  'ThumbnailURL': 'http://www.moma.org/media/W1siZiIsIjU4NSJdLFsicCIsImNvbnZlcnQiLCItcmVzaXplIDMwMHgzMDBcdTAwM2UiXV0.jpg?sha=a476797cb482b3ca',
  'Title': 'Pedestrian City project, Hypostyle Courtyard (Developed plan and '
           'section elevations)',
  'URL': 'http://www.moma.org/collection/works/87',
  'Width (cm)': 110.8077},
 {'AccessionNumber': '23.2000',
  'Artist': ['Bernard Tschu

  'Department': 'Architecture & Design',
  'Dimensions': '7 3/4 x 18 3/8" (19.7 x 46.7 cm)',
  'EndDate': [1940],
  'Gender': ['Male'],
  'Height (cm)': 19.7,
  'Medium': 'Graphite on tracing paper',
  'Nationality': ['Swedish'],
  'ObjectID': 145,
  'ThumbnailURL': 'http://www.moma.org/media/W1siZiIsIjEyMzQiXSxbInAiLCJjb252ZXJ0IiwiLXJlc2l6ZSAzMDB4MzAwXHUwMDNlIl1d.jpg?sha=aa073fbe8ace413c',
  'Title': 'Woodland Crematorium, Woodland Cemetery, Stockholm, Sweden, First '
           'version: exterior perspective',
  'URL': 'http://www.moma.org/collection/works/145',
  'Width (cm)': 46.7},
 {'AccessionNumber': '60.1990',
  'Artist': ['Erik Gunnar Asplund'],
  'ArtistBio': ['Swedish, 1885–1940'],
  'BeginDate': [1885],
  'Cataloged': 'Y',
  'Classification': 'Architecture',
  'ConstituentID': [27],
  'CreditLine': 'Gift of Blanchette Hooker Rockefeller, Mrs. Gifford Phillips, '
                'Celeste and Armand P. Bartos, Mrs. S. I. Newhouse, Jr., and '
                'purchase',
  'Dat

  'ConstituentID': [5076],
  'CreditLine': 'Gift of the architect',
  'Date': '1958–1964',
  'DateAcquired': '1989-05-16',
  'Department': 'Architecture & Design',
  'Dimensions': '27 1/4 x 34 1/4" (69.2 x 87 cm)',
  'EndDate': [1997],
  'Gender': ['Male'],
  'Height (cm)': 69.2,
  'Medium': 'Graphite and colored pencil on paper',
  'Nationality': ['American'],
  'ObjectID': 193,
  'ThumbnailURL': 'http://www.moma.org/media/W1siZiIsIjE4MDAiXSxbInAiLCJjb252ZXJ0IiwiLXJlc2l6ZSAzMDB4MzAwXHUwMDNlIl1d.jpg?sha=0e54cf2f46103a1d',
  'Title': 'Yale University, Art and Architecture Building, New Haven, '
           'Connecticut, Elevation',
  'URL': 'http://www.moma.org/collection/works/193',
  'Width (cm)': 87.0},
 {'AccessionNumber': '99.2000',
  'Artist': ['Erich Mendelsohn'],
  'ArtistBio': ['American, born Germany (now Poland). 1887–1953'],
  'BeginDate': [1887],
  'Cataloged': 'Y',
  'Classification': 'Architecture',
  'ConstituentID': [8219],
  'CreditLine': 'Gift of Milton Scheingarten',


  'CreditLine': 'Gift of David Rockefeller, Jr. Fund, Ira Howard Levy Fund, '
                'and Jeffrey P. Klein Purchase Fund',
  'Date': '1915-17',
  'DateAcquired': '1993-05-04',
  'Department': 'Architecture & Design',
  'Dimensions': None,
  'EndDate': [1959],
  'Gender': ['Male'],
  'Medium': 'Lithograph',
  'Nationality': ['American'],
  'ObjectID': 241,
  'ThumbnailURL': None,
  'Title': 'American System-Built Houses for The Richards Company, project, '
           'Milwaukee, Wisconsin, Perspectives, plans, and elevations',
  'URL': None},
 {'AccessionNumber': '155.1993.13',
  'Artist': ['Frank Lloyd Wright'],
  'ArtistBio': ['American, 1867–1959'],
  'BeginDate': [1867],
  'Cataloged': 'Y',
  'Classification': 'Architecture',
  'ConstituentID': [6459],
  'CreditLine': 'Gift of David Rockefeller, Jr. Fund, Ira Howard Levy Fund, '
                'and Jeffrey P. Klein Purchase Fund',
  'Date': '1915–1917',
  'DateAcquired': '1993-05-04',
  'Department': 'Architecture & Design

Much like a dictionary object in Python, a JSON object is made up of key-value pairs that can contain (and be contained in) lists. In this case, MoMA has presented metadata for its artworks as a list of key-value pairs.

To see the number of artworks included in the JSON object, use the `len()` function.

In [45]:
len(json_data)

133331

To view the key-value metadata for a random artwork, use `random.choice()`.


In [46]:
import random

random.choice(json_data)

{'AccessionNumber': '48.1991.x1-x2',
 'Artist': ['Karen Klugman'],
 'ArtistBio': ['American, born 1946'],
 'BeginDate': [1946],
 'Cataloged': 'N',
 'Classification': 'Photograph',
 'ConstituentID': [7030],
 'CreditLine': 'The Family of Man Fund',
 'Date': '1989',
 'DateAcquired': '1991-03-14',
 'Department': 'Photography',
 'Dimensions': '28 1/8 x 38 5/16" (71.4 x 97.3 cm)',
 'EndDate': [0],
 'Gender': ['Female'],
 'Height (cm)': 71.4,
 'Medium': 'Chromogenic color print',
 'Nationality': ['American'],
 'ObjectID': 44797,
 'ThumbnailURL': None,
 'Title': 'Headsets',
 'URL': None,
 'Width (cm)': 97.3}

Using bracket notation, we can access individual metadata fields by their keys. Here we display several metadata fields for a randomly chosen artwork.

In [47]:
artwork = random.choice(json_data)

print(artwork['Artist'])
print(artwork['Title'])
print(artwork['Date'])
print(artwork['ObjectID'])
print(artwork['URL'])

['Lee Friedlander']
Santa Fe
1995
87025
None


The following loop will print the `Artist` field for the first 100 artworks in the list.

In [48]:
for item in json_data[:100]:
    pprint(item['Artist'])

['Otto Wagner']
['Christian de Portzamparc']
['Emil Hoppe']
['Bernard Tschumi']
['Emil Hoppe']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Louis I. Kahn']
['Bernard Tschumi']
['Marcel Kammerer']
['Bernard Tschumi']
['Otto Schönthal']
['Bernard Tschumi']
['Otto Schönthal']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard Tschumi']
['Bernard

You can also use `random.sample()` to view artist names for 100 artworks chosen at random.

In [49]:
for item in random.sample(json_data, 100):
    pprint(item['Artist'])

['Henri Michaux']
['Franz Erhard Walther']
['Georges Braque']
['Barney Ingoglia/The New York Times']
['Rufino Tamayo']
['Felice Beato']
['William Wegman']
['Natalia Goncharova']
['Wols (A. O. Wolfgang Schulze)']
['Tony Bevan']
['Eugène Atget']
['Aftograf']
['Dóra Maurer']
['Adolf Fassbender']
['Glenn O. Coleman']
['Various Artists']
['Nicholas Krushenick']
['Kaj Franck']
['Kazimir Malevich']
['Louise Bourgeois']
['Lee Friedlander']
['Garo Antreasian']
['Clinton Adams']
['Ivan Kudriashov']
['Raoul Dufy']
['K.R.H. Sonderborg']
['André Dunoyer de Segonzac']
['Louise Bourgeois']
['George Maciunas']
['Louise Bourgeois']
['Gerald Jackson']
['Palle Nielson']
['Joel Meyerowitz']
['Vanessa Beecroft']
['caraballo-farman']
['Maurice de Vlaminck']
['Georg Baselitz']
['James McGarrell']
['Various Artists']
['Unknown photographer']
['Peter Doig']
['Charles J. Brabin']
['Eduardo Paolozzi']
['Seymour Chwast', 'Richard Mantel', 'Milton Glaser']
['Safi Faye']
['Chris Ofili']
['Oskar Schlemmer']
['Vincen

#### JSON Data to CSV
Next we’ll transfer these metadata fields to CSV format. First, let’s print a list of metadata fields for reference:

In [50]:
header = []

for key in json_data[0]:
    header.append(key)

Next we'll create a list of metadata fields to include in our CSV. These keys will appear at the top of the file as column headers.


In [51]:
column_headers = ['Date', 'Artist', 'Title', 'Medium', 'Nationality', 'ObjectID', 'URL', 'Department']

pprint(column_headers)

['Date',
 'Artist',
 'Title',
 'Medium',
 'Nationality',
 'ObjectID',
 'URL',
 'Department']


Then we’ll use these keys to create a list of rows for our CSV. Since some metadata entries in the JSON object appear as lists rather than strings, we’ll use the `str()` function to reformat each metadata item as we add it to the table.

To avoid slowing things down, we will work with metadata for 20,000 randomly chosen artworks.

In [52]:
meta_table = []

for record in random.sample(json_data, 20000):
    row = []
    for key in column_headers:
        row.append(str(record[key]))
    meta_table.append(row)

pprint(meta_table[0])

['c. 1965',
 "['Alison Knowles']",
 'by Alison Knowles',
 'Series of ten sheets of paper with carbon copy, typewriter, ink, blue '
 'pencil, and graphite, with additions by Dick Higgins and unknown hands',
 "['American']",
 '127405',
 'http://www.moma.org/collection/works/127405',
 'Prints & Illustrated Books']


Finally, we will write our metadata list of lists as a CSV.


In [53]:
import csv

out_path = "/sharedfolder/MoMA_20K.csv"

with open(out_path, 'w') as fo:
    csv_out = csv.writer(fo)
    csv_out.writerow(column_headers)
    csv_out.writerows(meta_table)

Open your CSV in LibreOffice or Excel.