# Day Two: Selection, Sorting, and Nulls

First, import pandas using the conventional alias `pd`.

In [9]:
import pandas as pd
print(pd.__version__)

2.3.3


Read in Day One's `IAAPI_raw_2023-10-05.csv` from your `data` folder.

In [10]:
moma = pd.read_csv("data/IAAPI_raw_2023-10-05.csv", sep="|")
moma

Unnamed: 0,Match point,ULAN ID,Wikidata ID,VIAF ID,LC ID,Name,Label,Birth date,Death date,Ancestry/Heritage,Indexes,Description,Watsonline,Watsonline URL,Met Collection,Met Collection URL,Wikipedia,Wikipedia URL,Art Asia America,AAA URL
0,AAPI-0001,500487777.0,Q466654,79468708.0,n86857749,"Abad, Pacita",Pacita Abad,1946.0,2004.0,Filipino/a/x,,painter,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Pacita_Abad,,
1,AAPI-0970,,Q47157454,102816611.0,no2007063757,"Abbas, Hamra",Hamra Abbas,1976.0,,Pakistani,,sculptor;painter;installation artist,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,
2,AAPI-0002,500116914.0,Q7426381,96547973.0,no99018101,"Abe, Satoru",Satoru Abe,1926.0,,Japanese;Hawaiian (Kamaʻāina),,painter;sculptor,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Satoru_Abe,,
3,AAPI-1031,,Q23881684,307448466.0,no2013110010,"Abichandani, Jaishri",Jaishri Abichandani,1969.0,,Indian,,interdisciplinary artist,Watsonline,https://library.metmuseum.org/search/?searchty...,,,,,,
4,AAPI-0003,,Q19867429,,,"Acebo Davis, Terry",Terry Acebo Davis,1953.0,,Filipino/a/x,,installation artist;mixed-media artist;printmaker,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Terry_Acebo_Davis,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
925,AAPI-0890,500524609.0,Q21607972,307424125.0,n2014002953,"Zheng, Chongbin",Chongbin Zheng,1961.0,,Chinese,,installation artist;painter;time-based media a...,Watsonline,https://library.metmuseum.org/search/?searchty...,Met Collection,https://www.metmuseum.org/art/collection/searc...,,,,
926,AAPI-0735,,Q8070726,,,"Zheng, Lianjie",Lianjie Zheng,1962.0,,Chinese,,installation artist;painter;performance artist...,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Zheng_Lianjie,artasiamerica,http://artasiamerica.org/artist/detail/57
927,AAPI-0736,,Q120867919,,,"Zheng, Shengtian",Shengtian Zheng,1938.0,,Chinese,,painter;ink artist/calligrapher,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,
928,AAPI-0889,,Q120868879,,,"Zhong, Yueying",Yueying Zhong,1960.0,,Chinese,,painter;ink artist/calligrapher,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,


### Pandas Selection: Boolean Indexing

So far, we've been selecting rows and columns based on their *position* in the DataFrame.

However, a more common use case is the desire to filter data by their attributes and other criteria. We will do so in Pandas with Boolean indexing.

[**Boolean indexing**](https://pandas.pydata.org/docs/user_guide/indexing.html#boolean-indexing) uses boolean arrays to filter the data from either a DataFrame or Series. Just like other conditional statements, operators like `==`, `>`, and `<=` can be used.

Booleans are returned by evaluation of conditional statements in Python.
For example, this statement returns True, because 3 is less than 5.

In [67]:
3 < 5

True

We can also use comparators to get Booleans by comparing variables.

In [72]:
x = 10 
x == 15 # This is value. Read as "is x equal to 15"

False

In the simplest case, Boolean indexing as applied to a Series returns *another* series of Booleans. 

In [54]:
my_s = pd.Series([1, 5, 1, -1, 10])
my_s == 1

0     True
1    False
2     True
3    False
4    False
dtype: bool

The original index is maintained, which tells us that the condition was True
for row 0, False for row 1, and so on.

We can then select only the rows with `True` by passing that Boolean array back into the parent Series or DataFrame.

In [55]:
my_s[my_s == 1]

0    1
2    1
dtype: int64

This feature is used on columns in a DataFrame to imitiate the filtering behavior of `SELECT` statements in SQL, where *rows* are selected based on values in a specific *column*.

Because each column in a DataFrame represents a different observed variable, we will typically apply boolean indexing to one column at a time. For example, "Birth date" and "Name" have very different domains and ranges.

In [77]:
moma["Birth date"] == 1926

0      False
1      False
2       True
3      False
4      False
       ...  
925    False
926    False
927    False
928    False
929    False
Name: Birth date, Length: 930, dtype: bool

In the simplest case, Boolean indexing as applied to a Series returns *another* series of Booleans. If use this selection to select row in the original DataFrame,
we get a slice of the DataFrame representing the 4 artists born in 1976.

In [79]:
moma[moma["Birth date"] == 1926]

Unnamed: 0,Match point,ULAN ID,Wikidata ID,VIAF ID,LC ID,Name,Label,Birth date,Death date,Ancestry/Heritage,Indexes,Description,Watsonline,Watsonline URL,Met Collection,Met Collection URL,Wikipedia,Wikipedia URL,Art Asia America,AAA URL
2,AAPI-0002,500116914.0,Q7426381,96547973.0,no99018101,"Abe, Satoru",Satoru Abe,1926.0,,Japanese;Hawaiian (Kamaʻāina),,painter;sculptor,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Satoru_Abe,,
39,AAPI-0030,500077806.0,Q7382874,96234311.0,n86835007,"Asawa, Ruth",Ruth Asawa,1926.0,2013.0,Japanese,,draftsperson;painter;printmaker;sculptor,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Ruth_Asawa,,
698,AAPI-0532,,Q18392945,53065616.0,,"Sekimachi, Kay",Kay Sekimachi,1926.0,,Japanese,,textile/fiber artist,Watsonline,https://library.metmuseum.org/search~S1/?searc...,Met Collection,https://www.metmuseum.org/art/collection/searc...,Wikipedia,https://en.wikipedia.org/wiki/Kay_Sekimachi,,
903,AAPI-1287,,Q39744310,251125388.0,n87929671,"Yoshimura, Fumio",Fumio Yoshimura,1926.0,2002.0,Japanese,,sculptor,Watsonline,https://library.metmuseum.org/search~S1/?searc...,Met Collection,https://www.metmuseum.org/art/collection/searc...,Wikipedia,https://en.wikipedia.org/wiki/Fumio_Yoshimura,,


One thing to note is that boolean indexing against `NaN`, `None`, or missing vlaues always returns False.

The 110 rows corresponding to artists with no values in the "Birth date" column are assumed *NOT* to be born on 1926.

Let's say we want to find out which artists in our dataset of Japanese descent were born before 1942. We need to use **boolean indexing** to select rows that fulfill specific conditions, namely, we want to artists of Japanese `Ancestry/Heritage` with a `Birth date` before 1942..

To start, this is how boolean indexing can be used to filter the `moma` DataFrame for artists of Japanese descent.

In [21]:
moma["Ancestry/Heritage"] == "Japanese" # This is a boolean index

0      False
1      False
2      False
3      False
4      False
       ...  
925    False
926    False
927    False
928    False
929     True
Name: Ancestry/Heritage, Length: 930, dtype: bool

Next, we use the boolean index to select rows from the original DataFrame.

In [22]:
japanese_artists = moma[moma["Ancestry/Heritage"] == "Japanese"] # This is a DataFrame 
japanese_artists

Unnamed: 0,Match point,ULAN ID,Wikidata ID,VIAF ID,LC ID,Name,Label,Birth date,Death date,Ancestry/Heritage,Indexes,Description,Watsonline,Watsonline URL,Met Collection,Met Collection URL,Wikipedia,Wikipedia URL,Art Asia America,AAA URL
7,AAPI-1158,,Q120717678,,,"Adachi, Margaret",Margaret Adachi,1952.0,,Japanese,,sculptor;installation artist;mixed-media artist,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,
8,AAPI-0005,,Q29110238,7.983150e+21,,"Agematsu, Yuji",Yuji Agematsu,1956.0,,Japanese,,assemblage artist,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,
18,AAPI-0013,500265984.0,Q771277,1.265809e+08,,"Akamu, Nina",Nina Akamu,1955.0,,Japanese,,sculptor,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Nina_Akamu,,
19,AAPI-1032,500372656.0,Q61997639,3.612150e+21,,"Akashi, Kelly",Kelly Akashi,1983.0,,Japanese,,sculptor,Watsonline,https://library.metmuseum.org/search/?searchty...,,,,,,
22,AAPI-0015,500442151.0,Q120411767,,,"Allen, John",John Allen,1955.0,,Japanese,,painter,Watsonline,https://library.metmuseum.org/search~S1/X?SEAR...,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
893,AAPI-1109,,Q120077471,,,"Yoneoka, Elaine",Elaine Yoneoka,1955.0,,Japanese,,ceramicist / potter;mixed-media artist,Watsonline,https://library.metmuseum.org/search~S1/X?SEAR...,,,,,,
902,AAPI-0844,,Q120867916,,,"Yoshida, Sekido",Sekido Yoshida,1894.0,1965.0,Japanese,,painter;printmaker,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,
903,AAPI-1287,,Q39744310,2.511254e+08,n87929671,"Yoshimura, Fumio",Fumio Yoshimura,1926.0,2002.0,Japanese,,sculptor,Watsonline,https://library.metmuseum.org/search~S1/?searc...,Met Collection,https://www.metmuseum.org/art/collection/searc...,Wikipedia,https://en.wikipedia.org/wiki/Fumio_Yoshimura,,
904,AAPI-0717,,Q7982838,,,"Yoshimura, Wendy",Wendy Masako Yoshimura,1943.0,,Japanese,,painter,Watsonline,https://library.metmuseum.org/search~S1/X?SEAR...,,,Wikipedia,https://en.wikipedia.org/wiki/Wendy_Yoshimura,,


**Q:** How can we find the artists born before 1942?

In [23]:
moma[moma["Birth date"] < 1942]

Unnamed: 0,Match point,ULAN ID,Wikidata ID,VIAF ID,LC ID,Name,Label,Birth date,Death date,Ancestry/Heritage,Indexes,Description,Watsonline,Watsonline URL,Met Collection,Met Collection URL,Wikipedia,Wikipedia URL,Art Asia America,AAA URL
2,AAPI-0002,500116914.0,Q7426381,9.654797e+07,no99018101,"Abe, Satoru",Satoru Abe,1926.0,,Japanese;Hawaiian (Kamaʻāina),,painter;sculptor,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Satoru_Abe,,
10,AAPI-1159,500464887.0,Q5292140,7.645150e+21,no2017159440,"Ahn, Don",Don Ahn,1937.0,2013.0,Korean,,painter,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Don_Ahn,,
13,AAPI-0009,,Q98505561,,,"Ahn, Young-Il",Young-Il Ahn,1934.0,2020.0,Korean,,painter,Watsonline,https://library.metmuseum.org/search/?searchty...,,,Wikipedia,https://en.wikipedia.org/wiki/Young-Il_Ahn,,
16,AAPI-0012,,Q4997320,5.234150e+21,no2017156974,"Akaji, Bumpei",Bumpei Akaji,1921.0,2002.0,Japanese;Hawaiian (Kamaʻāina),,sculptor,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Bumpei_Akaji,,
23,AAPI-0016,,Q120717733,,,"Alvarado, Ricardo Ocreto",Ricardo Ocreto Alvarado,1914.0,1976.0,Filipino/a/x,,photographer,Watsonline,https://library.metmuseum.org/search/?searchty...,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
917,AAPI-0057,500107140.0,Q163298,9.464502e+07,n83158786,"Zhang, Daqian",Daqian Zhang,1899.0,1893.0,Chinese,,painter;ink artist/calligrapher,Watsonline,https://library.metmuseum.org/search/?searchty...,,,Wikipedia,https://en.wikipedia.org/wiki/Chang_Dai-chien,,
921,AAPI-0058,,Q28663102,,,"Zhang, Shuqi",Shuchi (Shuqi) Chang (Zhang),1900.0,1957.0,Chinese,,painter;printmaker;ink artist/calligrapher,Watsonline,https://library.metmuseum.org/search~S1/X?SEAR...,Met Collection,https://www.metmuseum.org/art/collection/searc...,Wikipedia,https://en.wikipedia.org/wiki/Zhang_Shuqi,,
923,AAPI-0070,,Q120077457,,,"Zhao, Chunxiang",Chung-Hsiang (Chunxiang) Chao (Zhao),1910.0,1991.0,Chinese,,painter;ink artist/calligrapher,Watsonline,https://library.metmuseum.org/search~S1/X?SEAR...,,,,,artasiamerica,http://artasiamerica.org/artist/detail/50
927,AAPI-0736,,Q120867919,,,"Zheng, Shengtian",Shengtian Zheng,1938.0,,Chinese,,painter;ink artist/calligrapher,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,


This is getting us closer, but we're not quite there yet. We know how to find
rows with Japanese artists, and we know how to find rows of artists born before
1942, but we want the *intersection* of these two groups. That is, we want
to find the rows where both conditions are True.

### Compound boolean indexing
We want a DataFrame with both conditions combined. That is, we want to find the rows which artists in our dataset are of Japanese descent **AND** were born before 1942. this is where compound boolean indexing comes in.

We can use `|` for OR and `&` for AND. The compound boolean index is in the form
of (CONDITION) [| or &| *CONDITION).

In [24]:
(moma["Birth date"] < 1942) & (moma["Ancestry/Heritage"] == "Japanese") # | = or, & = and

0      False
1      False
2      False
3      False
4      False
       ...  
925    False
926    False
927    False
928    False
929     True
Length: 930, dtype: bool

Let's give this index a name `jn_pre1942` and apply it to the parent DataFrame.

In [58]:
jn_pre1942_index = (moma["Birth date"] < 1942) & (moma["Ancestry/Heritage"] == "Japanese") # | = or, & = and
moma[jn_pre1942_index]

Unnamed: 0,Match point,ULAN ID,Wikidata ID,VIAF ID,LC ID,Name,Label,Birth date,Death date,Ancestry/Heritage,Indexes,Description,Watsonline,Watsonline URL,Met Collection,Met Collection URL,Wikipedia,Wikipedia URL,Art Asia America,AAA URL
25,AAPI-0017,500054317.0,Q19864133,1431348.0,n85027139,"Amino, Leo",Leo Amino,1911.0,1989.0,Japanese,,sculptor,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Leo_Amino,,
29,AAPI-0020,500465254.0,Q17421288,,,"Aoki, Toshio",Toshio Aoki,1854.0,1912.0,Japanese,,painter;ink artist/calligrapher,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Toshio_Aoki,,
36,AAPI-0025,500123705.0,Q478264,6613.0,n79091248,"Arakawa, Shusaku",Shusaku Arakawa,1936.0,2010.0,Japanese,,architect;conceptual artist;painter;printmaker...,Watsonline,https://library.metmuseum.org/search~S1/?searc...,Met Collection,https://www.metmuseum.org/art/collection/searc...,Wikipedia,https://en.wikipedia.org/wiki/Shusaku_Arakawa,,
39,AAPI-0030,500077806.0,Q7382874,96234311.0,n86835007,"Asawa, Ruth",Ruth Asawa,1926.0,2013.0,Japanese,,draftsperson;painter;printmaker;sculptor,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Ruth_Asawa,,
121,AAPI-0099,,Q6313941,16883563.0,no2001102256,"Chodos, Junko",Junko Chodos (Takahashi),1939.0,,Japanese,,draftsperson;mixed-media artist;painter;sculptor,Watsonline,https://library.metmuseum.org/search~S1/X?SEAR...,,,Wikipedia,https://en.wikipedia.org/wiki/Junko_Chodos,artasiamerica,http://artasiamerica.org/artist/detail/37
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
881,AAPI-0703,500041099.0,Q7686513,108861782.0,n82056657,"Yashima, Taro",Taro Yashima,1908.0,1994.0,Japanese,,illustrator;painter,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Taro_Yashima,,
889,AAPI-0709,,Q111532416,,,"Yoda, Toshihisa",Toshihisa Yoda,1940.0,,Japanese,,draftsperson;installation artist;painter,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,
902,AAPI-0844,,Q120867916,,,"Yoshida, Sekido",Sekido Yoshida,1894.0,1965.0,Japanese,,painter;printmaker,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,
903,AAPI-1287,,Q39744310,251125388.0,n87929671,"Yoshimura, Fumio",Fumio Yoshimura,1926.0,2002.0,Japanese,,sculptor,Watsonline,https://library.metmuseum.org/search~S1/?searc...,Met Collection,https://www.metmuseum.org/art/collection/searc...,Wikipedia,https://en.wikipedia.org/wiki/Fumio_Yoshimura,,


Looks good? Let's assign this slice to its own variable and use `.info()` to get some information about it.

In [62]:
jn_pre1942 = moma[jn_pre1942_index]
jn_pre1942.info()

<class 'pandas.core.frame.DataFrame'>
Index: 144 entries, 25 to 929
Data columns (total 20 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Match point         144 non-null    object 
 1   ULAN ID             64 non-null     float64
 2   Wikidata ID         144 non-null    object 
 3   VIAF ID             86 non-null     float64
 4   LC ID               70 non-null     object 
 5   Name                144 non-null    object 
 6   Label               142 non-null    object 
 7   Birth date          144 non-null    float64
 8   Death date          110 non-null    float64
 9   Ancestry/Heritage   144 non-null    object 
 10  Indexes             0 non-null      object 
 11  Description         144 non-null    object 
 12  Watsonline          144 non-null    object 
 13  Watsonline URL      144 non-null    object 
 14  Met Collection      23 non-null     object 
 15  Met Collection URL  23 non-null     object 
 16  Wikipedia   

#### Quiz
How many artists of Japanese descent in the dataset were born before 1942? How do you know?

In [66]:
jn_pre1942.shape

(144, 20)

### Sorting
In order to sort a DataFrame by a specific column, we can use the [**DataFrame.sort_values()**](https://pandas.pydata.org/docs/dev/reference/api/pandas.DataFrame.sort_values.html#pandas.DataFrame.sort_values) function. The sort_values() takes in a *column name* so we can tell Pandas what to sort the rows by.

In [26]:
moma.sort_values(by="Death date")

Unnamed: 0,Match point,ULAN ID,Wikidata ID,VIAF ID,LC ID,Name,Label,Birth date,Death date,Ancestry/Heritage,Indexes,Description,Watsonline,Watsonline URL,Met Collection,Met Collection URL,Wikipedia,Wikipedia URL,Art Asia America,AAA URL
917,AAPI-0057,500107140.0,Q163298,94645020.0,n83158786,"Zhang, Daqian",Daqian Zhang,1899.0,1893.0,Chinese,,painter;ink artist/calligrapher,Watsonline,https://library.metmuseum.org/search/?searchty...,,,Wikipedia,https://en.wikipedia.org/wiki/Chang_Dai-chien,,
319,AAPI-0782,,Q120867937,,,"Kagi, Tameya",Tameya Kagi,1851.0,1894.0,Japanese,,painter,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,
388,AAPI-0787,,Q120867918,,,"Kobayashi, Senko",Senko Kobayashi,1870.0,1911.0,Japanese,,painter,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,
29,AAPI-0020,500465254.0,Q17421288,,,"Aoki, Toshio",Toshio Aoki,1854.0,1912.0,Japanese,,painter;ink artist/calligrapher,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Toshio_Aoki,,
755,AAPI-0821,,Q81027329,,,"Takahashi, Katsuo",Katsuo Takahashi,1860.0,1917.0,Japanese,,painter;textile/fiber artist,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
924,AAPI-1052,,Q120717727,,,"Zhao, Pop",Pop Zhao,1963.0,,Chinese,,,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,
925,AAPI-0890,500524609.0,Q21607972,307424125.0,n2014002953,"Zheng, Chongbin",Chongbin Zheng,1961.0,,Chinese,,installation artist;painter;time-based media a...,Watsonline,https://library.metmuseum.org/search/?searchty...,Met Collection,https://www.metmuseum.org/art/collection/searc...,,,,
926,AAPI-0735,,Q8070726,,,"Zheng, Lianjie",Lianjie Zheng,1962.0,,Chinese,,installation artist;painter;performance artist...,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Zheng_Lianjie,artasiamerica,http://artasiamerica.org/artist/detail/57
927,AAPI-0736,,Q120867919,,,"Zheng, Shengtian",Shengtian Zheng,1938.0,,Chinese,,painter;ink artist/calligrapher,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,


This pattern of applying the output of one function to another using `.` is called **chaining**. It is very common in Pandas, where the same DataFrame or Series will undergo several consecutive transformations.

#### Quiz
 How can we use filtering, sorting, and head to find the three youngest artists of Chinese descent in the dataset?

In [27]:
 chinese_artists = moma[moma["Ancestry/Heritage"] == "Chinese"] 

First get the artists of Chinese descent, then sort them by `birth_date`.

By default, these are sorted in ascending order, let's switch to descending order using the `ascending` argument.

In [28]:
chinese_artists.sort_values(by=["Birth date"]).head(3)

Unnamed: 0,Match point,ULAN ID,Wikidata ID,VIAF ID,LC ID,Name,Label,Birth date,Death date,Ancestry/Heritage,Indexes,Description,Watsonline,Watsonline URL,Met Collection,Met Collection URL,Wikipedia,Wikipedia URL,Art Asia America,AAA URL
894,AAPI-0843,,Q120717654,,,"Yong, Lai",Lai Yong,1840.0,,Chinese,,painter;photographer,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,
773,AAPI-0824,,Q27978700,,,"Tape, Mary",Mary Tape,1857.0,1934.0,Chinese,,painter;photographer,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Mary_Tape,,
876,AAPI-0839,,Q109628772,,,"Yang, Ling-fu",Ling-fu Yang,1888.0,1978.0,Chinese,,painter,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Yang_Ling-fu,,


In [29]:
chinese_artists.sort_values(by=["Birth date"], ascending=False).head(3)

Unnamed: 0,Match point,ULAN ID,Wikidata ID,VIAF ID,LC ID,Name,Label,Birth date,Death date,Ancestry/Heritage,Indexes,Description,Watsonline,Watsonline URL,Met Collection,Met Collection URL,Wikipedia,Wikipedia URL,Art Asia America,AAA URL
637,AAPI-1246,,Q120077443,,,"Ouyang, Catalina",Catalina Ouyang,1993.0,,Chinese,,sculptor,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,
722,AAPI-1040,500778584.0,Q76500450,,,"Sin, Wai Kin",,1991.0,,Chinese,,multimedia artist;performance artist,Watsonline,https://library.metmuseum.org/search/?searchty...,,,Wikipedia,https://en.wikipedia.org/wiki/Victoria_Sin,,
107,AAPI-0084,,Q120717769,,,"Cheng, Luke Luokun",Luke Luokun Cheng,1991.0,,Chinese,,installation artist;performance artist,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,


**Q:** Can you use **.loc[]** and compound boolean indexing to find the `Name` and `Description` of all the millenials in the dataset. 
For our purposes, let's say a millenial is someone born between 1981 and 1996 inclusive.

First compute boolean indices for each of the two conditions: born >= 1981 AND born <= 1996.

In [30]:
moma["Birth date"] >= 1981

0      False
1      False
2      False
3      False
4      False
       ...  
925    False
926    False
927    False
928    False
929    False
Name: Birth date, Length: 930, dtype: bool

In [31]:
moma["Birth date"] <= 1996

0      True
1      True
2      True
3      True
4      True
       ... 
925    True
926    True
927    True
928    True
929    True
Name: Birth date, Length: 930, dtype: bool

Then, use `&` to find the intersection of the two boolean indexes.
Assign the result to the variable `millenial_index`.

In [32]:
millenial_index = (moma["Birth date"] >= 1981) & (moma["Birth date"] <= 1996) 

Then, assign the millenial rows to `millenials`.

In [33]:
millenials = moma[millenial_index]
millenials

Unnamed: 0,Match point,ULAN ID,Wikidata ID,VIAF ID,LC ID,Name,Label,Birth date,Death date,Ancestry/Heritage,Indexes,Description,Watsonline,Watsonline URL,Met Collection,Met Collection URL,Wikipedia,Wikipedia URL,Art Asia America,AAA URL
9,AAPI-0007,,Q19597819,,,"Aguhar, Mark",Mark Aguhar,1987.0,2012.0,Filipino/a/x,,multidisciplinary artist;multimedia artist;per...,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Mark_Aguhar,,
19,AAPI-1032,500372656.0,Q61997639,3.612150e+21,,"Akashi, Kelly",Kelly Akashi,1983.0,,Japanese,,sculptor,Watsonline,https://library.metmuseum.org/search/?searchty...,,,,,,
24,AAPI-1050,,Q120077483,,,"Amano, Fumi",Fumi Amano,1985.0,,Japanese,,sculptor,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,
38,AAPI-0027,,Q31087622,3.155584e+08,,"Arunanondchai, Korakrit",Korakrit Arunanondchai,1986.0,,Thai,,multimedia artist;time-based media artist,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Korakrit_Arunano...,,
54,AAPI-1161,,Q120717730,,,"Bhow, Ragini",Ragini Bhow,1991.0,,Indian,,painter;sculptor,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
801,AAPI-0637,,Q50825806,1.367160e+21,,"Tse, Ka-Man",Ka-Man Tse,1981.0,,Hongkonger,,photographer;time-based media artist,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Ka-Man_Tse,,
811,AAPI-0650,,Q77325044,,,"Tuttle, Martha",Martha Tuttle,1989.0,,Chinese,,interdisciplinary artist;painter;sculptor;text...,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,
818,AAPI-1262,,Q44627574,2.815290e+19,no2018073615,"Uoo, Stewart",Stewart Uoo,1985.0,,Korean,,painter;sculptor;time-based media artist,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Stewart_Uoo,,
830,AAPI-0666,,Q111490819,,,Wangshui,Wangshui,1986.0,,Chinese,,installation artist;sculptor;time-based media ...,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/WangShui,,


Now that we've identified the millenials in `millenials`, let's use `.loc[]` and a list of column names to select the `Name` and `Description` columns.

In [34]:
millenials.loc[:, ["Name", "Description"]]

Unnamed: 0,Name,Description
9,"Aguhar, Mark",multidisciplinary artist;multimedia artist;per...
19,"Akashi, Kelly",sculptor
24,"Amano, Fumi",sculptor
38,"Arunanondchai, Korakrit",multimedia artist;time-based media artist
54,"Bhow, Ragini",painter;sculptor
...,...,...
801,"Tse, Ka-Man",photographer;time-based media artist
811,"Tuttle, Martha",interdisciplinary artist;painter;sculptor;text...
818,"Uoo, Stewart",painter;sculptor;time-based media artist
830,Wangshui,installation artist;sculptor;time-based media ...


### Interlude: Dictionaries
[**Dictionaries**](https://docs.python.org/3/tutorial/datastructures.html#dictionaries) are sets of **key-value** pairs indicated by {} in the form `{k1: v1, k2: v2..}`.

* Dictionary keys must be unique.
* Dictionary values do not need to be.
* If you know your key, dictionaries give you the corresponding value quickly.

Let's create a toy dictionary with patient names as *keys* and their diagnoses as *values*. 

In [5]:
patients = {}
patients["John"] = "kidney stones"
patients["David"] = "fever"
patients["Steven"] = "fever"
print(patients)

{'John': 'kidney stones', 'David': 'fever', 'Steven': 'fever'}


There are two ways to get values out of a dictionary by their keys: using the [] syntax or using a method called .get().

In [6]:
patients["John"]

'kidney stones'

The `[]` lookup syntax will fail with a **KeyError** if you try to lookup a key that is not in the dictionary.

In [7]:
patients["Emily"] # This will error out

KeyError: 'Emily'

To prevent your code from crashing with a **KeyError**, use the **.get()** method instead, which will return `None` if the specified key is not in the dictionary.

In [8]:
if (patients.get("Emily")) == None: # this will NOT crash
    print("Emily is not in the dictionary")
print(patients.get("David"))

Emily is not in the dictionary
fever


Dictionary *keys* must be immutable, but dictionary *values* can be just about anything.

**Hint**: The *mutable* objects you've encountered so far include lists and dictionaries.

You *can* use lists as values in dictionaries, and in fact, it's convenient to do so.

In [9]:
hospital = {"name": ["adam", "yao", "sam"], "ages":[15, 25, 30]}
hospital

{'name': ['adam', 'yao', 'sam'], 'ages': [15, 25, 30]}

### Wait, What Does This Have to Do with Pandas?

We can construct toy **Series** from lists by passing in a list of values *and* an index of the same length. (Remember, a **Series** is a one-dimensional array with an alphanumeric index.)

In [10]:
s = pd.Series(["cat", "dog", "antelope"]) # no index provided, defaults to rangeindex
s

0         cat
1         dog
2    antelope
dtype: object

In [11]:
z = pd.Series(data=["cat", "dog", "antelope"], index=["c", "d", "a"])
z

c         cat
d         dog
a    antelope
dtype: object

We can construct **DataFrames** from dictionaries of rows. Let's take the `hospital` dictionary from above. Note that the $nth$ item in the list corresponds to $nth$ row in the resulting DataFrame .

In [12]:
hospital_frame = pd.DataFrame(data=hospital, index=[101, 102, 103])
hospital_frame

Unnamed: 0,name,ages
101,adam,15
102,yao,25
103,sam,30


In this example, patients are indexed by a fake patient_id number. We can use **.loc[]** to retrieve rows by that index.

In [13]:
hospital_frame.loc[101]

name    adam
ages      15
Name: 101, dtype: object

There are many functions in Pandas that take in dictionaries or lists as **arguments**, so make sure you're comfortable with these data structures. 

## Missing Values
How to best handle missing values depends on your source data and the task you want to complete with your data. Though Pandas supports different missing value indicators, the most frequent, and the one you will observe when reading in empty cells from a file, is `numpy.nan`.

In [35]:
import numpy as np
pet_counts = pd.DataFrame({"name":["Jose", "David", "Rose", np.nan], "pets": [1, np.nan, 3, 5]})
pet_counts

Unnamed: 0,name,pets
0,Jose,1.0
1,David,
2,Rose,3.0
3,,5.0


NaNs do not count towards aggegated metrics like **count()**, **sum()**, or **mean()**.

In [36]:
pet_counts.count()

name    3
pets    3
dtype: int64

In [37]:
pet_counts["pets"].mean() # The missing value will not count towards this average

3.0

### Detecting Missing Values

np.nan is **not** `None` or `0`. You will need use a set of [specialized functions](https://pandas.pydata.org/docs/user_guide/missing_data.html#calculations-with-missing-data) from Pandas to test for or replace it.

In [38]:
pet_counts

Unnamed: 0,name,pets
0,Jose,1.0
1,David,
2,Rose,3.0
3,,5.0


In [39]:
pet_counts.loc[1 , "pets"] == None

False

**pd.isna()** can be used to check a single value.

In [40]:
pd.isna(pet_counts.loc[1, "pets"])

True

To detect NaNs in a DataFrame or Series, we can use the [**isna()**](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.isna.html#pandas.DataFrame.isna) method that returns a boolean array *of the same size as the input DataFrame*.

In [41]:
pet_counts.isna()

Unnamed: 0,name,pets
0,False,False
1,False,True
2,False,False
3,True,False


In [42]:
pet_counts.isna().sum() # Total # of Nans per column

name    1
pets    1
dtype: int64

Pandas also has a method for computing the opposite: which entries in a DataFrame are not null. This is [**notna()**](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.notna.html#pandas.DataFrame.notna).

In [43]:
pet_counts.notna()

Unnamed: 0,name,pets
0,True,True
1,True,False
2,True,True
3,False,True


We can then aggregate the returned frame to count the number of non-null values per column.

In [44]:
pet_counts.notna().sum()

name    3
pets    3
dtype: int64

The Pandas [**fillna()**](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html) function allows us to replace NaNs with a default value.

In [45]:
pet_counts.fillna(0)

Unnamed: 0,name,pets
0,Jose,1.0
1,David,0.0
2,Rose,3.0
3,0,5.0


Keep in mind that DataFrame methods return an altered copy of the original DataFrame. They does not change the original.

In [46]:
pet_counts

Unnamed: 0,name,pets
0,Jose,1.0
1,David,
2,Rose,3.0
3,,5.0


While 0 might be acceptable for an unknown number of pets (let's say participants were asked to fill in a blank), 0 is *not* an appropriate indicator for an unknown participant name.

**Q:** How can we construct a dictionary that replaces NaNs in the "pets" column with 0 and an unknown name with "UNKNOWN"?

In [47]:
replacements = {"pets":0, "name":"UNKNOWN"} # In column name: replacement value form

In [48]:
pet_counts.fillna(value=replacements) # Use the value keyword to specify the dictionary

Unnamed: 0,name,pets
0,Jose,1.0
1,David,0.0
2,Rose,3.0
3,UNKNOWN,5.0


But what if you simply want to remove problematic columns or rows from your dataset?

The [**DataFrame.dropna()**](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dropna.html#pandas.DataFrame.dropna) method by default removes all rows with one or more missing values.

In [6]:
pet_counts.dropna() # By default this drops all rows with one or missing values

Unnamed: 0,name,pets
0,Jose,1.0
2,Rose,3.0


If we use the `subset=` argument, we can pass in a list of columns that determine what rows are dropped. For example, if we pass in a list containing `name` but not `"pets"`, then rows with nulls in the `name` column will be dropped, but nulls in the `pets` column will be ignored.

In [7]:
pet_counts.dropna(subset=["name"]) # Takes in a list of column names

Unnamed: 0,name,pets
0,Jose,1.0
1,David,
2,Rose,3.0


To finalize our changes to the `pet_counts` frame, we must assign the return value of pet_counts to pet_counts as follows. Be very cautious when dropping rows and columns, and assign to a new variable if you're not sure.

In [8]:
pet_counts = pet_counts.dropna() # Takes in a list of column names
pet_counts

Unnamed: 0,name,pets
0,Jose,1.0
2,Rose,3.0


If you like to learn more about options for handling missing values in Pandas, they have
an [official guide](https://pandas.pydata.org/docs/user_guide/missing_data.html) that discusses the topic in depth.

### Putting it Together: Filtering Rows, Selecting Columns

Let's say you want to amend some of this data by providing missing descriptions for artists without them. The easiest place to start would be to identify artists with *Wikipedia* pages, as those will likely have enough information to supply the missing description.

Let's generate a boolean index and **.notna()** and **.isna()** to find artists *with* Wikipedia pages but *without* descriptions?

In [14]:
(moma["Description"].isna()) & (moma["Wikipedia URL"].notna())

0      False
1      False
2      False
3      False
4      False
       ...  
925    False
926    False
927    False
928    False
929    False
Length: 930, dtype: bool

Now, let's apply this selection and restrict it to the `Name`, `Description`, and `Wikipedia URL` columns.

In [15]:
moma[(moma["Description"].isna()) & (moma["Wikipedia URL"].notna())].loc[:, ["Name", "Description", "Wikipedia URL"]]

Unnamed: 0,Name,Description,Wikipedia URL
60,"Bonk, Keiko",,https://en.wikipedia.org/wiki/Keiko_Bonk
112,"Chiang, Fay",,https://en.wikipedia.org/wiki/Fay_Chiang
113,"Chiang, Janice",,https://en.wikipedia.org/wiki/Janice_Chiang
115,"Chin, Frank",,https://en.wikipedia.org/wiki/Frank_Chin
630,"Osorio, Jamaica",,https://en.wikipedia.org/wiki/Jamaica_Osorio


## Renaming Columns and Rows 
Naming variables with spaces (like `"Wikipedia URL"`) is not best practice. Let's go about fixing that.

We can use [**DataFrame.rename()**](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rename.html) to pass in key-value pairs that map old names to new ones. We can use rename on either row names or column names. 

The most straightfoward to use **.rename()** is pass in a dictionary of `{'old_name':'new_name','old_name1':'new_name1'...}` mappings as an keyword argument to either `index=` or `columns=` respectively.

For example, we could rename the `"Wikipedia URL"` and `"Birth date"` columns like this.

In [16]:
new_names = {"Wikipedia URL":"WikiURL", "Birth date":"Birth"} # Create a dictionary
moma.rename(columns=new_names).head(5)

Unnamed: 0,Match point,ULAN ID,Wikidata ID,VIAF ID,LC ID,Name,Label,Birth__date,Death date,Ancestry/Heritage,Indexes,Description,Watsonline,Watsonline URL,Met Collection,Met Collection URL,Wikipedia,Wikipedia_URL,Art Asia America,AAA URL
0,AAPI-0001,500487777.0,Q466654,79468708.0,n86857749,"Abad, Pacita",Pacita Abad,1946.0,2004.0,Filipino/a/x,,painter,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Pacita_Abad,,
1,AAPI-0970,,Q47157454,102816611.0,no2007063757,"Abbas, Hamra",Hamra Abbas,1976.0,,Pakistani,,sculptor;painter;installation artist,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,
2,AAPI-0002,500116914.0,Q7426381,96547973.0,no99018101,"Abe, Satoru",Satoru Abe,1926.0,,Japanese;Hawaiian (Kamaʻāina),,painter;sculptor,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Satoru_Abe,,
3,AAPI-1031,,Q23881684,307448466.0,no2013110010,"Abichandani, Jaishri",Jaishri Abichandani,1969.0,,Indian,,interdisciplinary artist,Watsonline,https://library.metmuseum.org/search/?searchty...,,,,,,
4,AAPI-0003,,Q19867429,,,"Acebo Davis, Terry",Terry Acebo Davis,1953.0,,Filipino/a/x,,installation artist;mixed-media artist;printmaker,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Terry_Acebo_Davis,,


**.rename()** will ignore columns or indices that do not exist in your DataFrame. If the old_name of the `old:new` pair doesn't exist, that change isn't applied and that renaming operation will be ignored.

In [17]:
moma.rename(index=new_names).head(5) # Rename rows when we mean to rename columns

Unnamed: 0,Match point,ULAN ID,Wikidata ID,VIAF ID,LC ID,Name,Label,Birth date,Death date,Ancestry/Heritage,Indexes,Description,Watsonline,Watsonline URL,Met Collection,Met Collection URL,Wikipedia,Wikipedia URL,Art Asia America,AAA URL
0,AAPI-0001,500487777.0,Q466654,79468708.0,n86857749,"Abad, Pacita",Pacita Abad,1946.0,2004.0,Filipino/a/x,,painter,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Pacita_Abad,,
1,AAPI-0970,,Q47157454,102816611.0,no2007063757,"Abbas, Hamra",Hamra Abbas,1976.0,,Pakistani,,sculptor;painter;installation artist,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,
2,AAPI-0002,500116914.0,Q7426381,96547973.0,no99018101,"Abe, Satoru",Satoru Abe,1926.0,,Japanese;Hawaiian (Kamaʻāina),,painter;sculptor,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Satoru_Abe,,
3,AAPI-1031,,Q23881684,307448466.0,no2013110010,"Abichandani, Jaishri",Jaishri Abichandani,1969.0,,Indian,,interdisciplinary artist,Watsonline,https://library.metmuseum.org/search/?searchty...,,,,,,
4,AAPI-0003,,Q19867429,,,"Acebo Davis, Terry",Terry Acebo Davis,1953.0,,Filipino/a/x,,installation artist;mixed-media artist;printmaker,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Terry_Acebo_Davis,,


Keep in mind that this mistake continues silently (without throwing an error) when debugging code.

### Batch Renaming With String Functions
There are several column names in this DataFrame with spaces in them. Do we really have to type out them all?

Pandas has special string-processing functions for Series and DataFrames inspired by Python's built-in [string operations](https://docs.python.org/3/library/stdtypes.html#string-methods) methods.

Let's review a few string methods: `str.lower()`, `str.isnumeric()`, `str.replace()`.

`str.lower()` returns a lowercase version of the string.

In [19]:
shouting = "ALL CAPS"
shouting.lower()

'all caps'

`str.isnumeric()` tests whether or not a string is a number. Any alphabetic characters will cause this to return `False`.

In [20]:
shouting.isnumeric()

False

In [21]:
lucky = "12"
lucky.isnumeric()

True

`str.replace(old, new)` returns a copy of a string with all occurrences of substring `old` replaced by `new`. 


A *substring* is 1 or more consecutive characters inside an existing string.
`"BAG"` is a substring of `"BAGEL"` but `"NEW YORK"` is not.

In [22]:
less_shouting = str.replace(shouting, "ALL", "some")
less_shouting

'some CAPS'

The `str.replace()` function is especially convenient for removing unwanted characters.

In [24]:
has_spaces = "This has spaces"
has_spaces.replace(" ", "_") # Replaces spaces with _

'This_has_spaces'

Wouldn't it be convenient if we could apply these to all of our columns at once?

That's where [`Series.str.replace()`](https://pandas.pydata.org/docs/reference/api/pandas.Series.str.replace.html#pandas.Series.str.replace) comes in.

We can access any DataFrame's column names using the `.columns` attribute.

In [25]:
moma.columns

Index(['Match point', 'ULAN ID', 'Wikidata ID', 'VIAF ID', 'LC ID', 'Name',
       'Label', 'Birth date', 'Death date', 'Ancestry/Heritage', 'Indexes',
       'Description', 'Watsonline', 'Watsonline URL', 'Met Collection',
       'Met Collection URL', 'Wikipedia', 'Wikipedia URL', 'Art Asia America',
       'AAA URL'],
      dtype='object')

Let's use the Pandas version of the `str.replace()` method we already know to replace spaces with the empty string.

In [14]:
moma.columns.str.replace(" ", "")

Index(['Matchpoint', 'ULANID', 'WikidataID', 'VIAFID', 'LCID', 'Name', 'Label',
       'Birthdate', 'Deathdate', 'Ancestry/Heritage', 'Indexes', 'Description',
       'Watsonline', 'WatsonlineURL', 'MetCollection', 'MetCollectionURL',
       'Wikipedia', 'WikipediaURL', 'ArtAsiaAmerica', 'AAAURL'],
      dtype='object')

This returns a **copy** of the column names, so we'll have to assign it back to the DataFrame.

In [13]:
moma.columns = moma.columns.str.replace(" ", "")
moma.head(5)

Unnamed: 0,Matchpoint,ULANID,WikidataID,VIAFID,LCID,Name,Label,Birthdate,Deathdate,Ancestry/Heritage,Indexes,Description,Watsonline,WatsonlineURL,MetCollection,MetCollectionURL,Wikipedia,WikipediaURL,ArtAsiaAmerica,AAAURL
0,AAPI-0001,500487777.0,Q466654,79468708.0,n86857749,"Abad, Pacita",Pacita Abad,1946.0,2004.0,Filipino/a/x,,painter,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Pacita_Abad,,
1,AAPI-0970,,Q47157454,102816611.0,no2007063757,"Abbas, Hamra",Hamra Abbas,1976.0,,Pakistani,,sculptor;painter;installation artist,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,
2,AAPI-0002,500116914.0,Q7426381,96547973.0,no99018101,"Abe, Satoru",Satoru Abe,1926.0,,Japanese;Hawaiian (Kamaʻāina),,painter;sculptor,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Satoru_Abe,,
3,AAPI-1031,,Q23881684,307448466.0,no2013110010,"Abichandani, Jaishri",Jaishri Abichandani,1969.0,,Indian,,interdisciplinary artist,Watsonline,https://library.metmuseum.org/search/?searchty...,,,,,,
4,AAPI-0003,,Q19867429,,,"Acebo Davis, Terry",Terry Acebo Davis,1953.0,,Filipino/a/x,,installation artist;mixed-media artist;printmaker,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Terry_Acebo_Davis,,


As a sneak peak of passing functions as arguments, I want to mention you can pass a function like `str.lower()` to rename.

## Writing Out DataFrames
To output a `DataFrame` to disk, we can use one of several fuctions including : `.to_csv()`, `to_excel()`, and `to_sql()`. Let's practice with [`.to_csv()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html).

Let's practice by writing out the cleaned version of the `moma` DataFrame.

Create a new folder named `output` in your project folder using the File Browser tab.

In [40]:
moma.to_csv("output/moma_cleaned.csv", index=False) 

This cell will not return any output. Instead, check the contents of the `output` folder and press the refresh symbol if necessary.