# Day Three: DataFrame Inference and Applications

As always, import pandas using the conventional alias `pd` and check the version. Any version should > 2.0 should do.

In [1]:
import pandas as pd
print(pd.__version__)

2.2.3


## Finishing Up with Missing Values

Let's recreate the toy `pet_counts` DataFrame to finish our discussion of missing values.

In [2]:
import numpy as np
pet_counts = pd.DataFrame({"name":["Jose", "David", "Rose", np.nan], "pets": [1, np.nan, 3, 5]})
pet_counts

Unnamed: 0,name,pets
0,Jose,1.0
1,David,
2,Rose,3.0
3,,5.0


To detect NaNs in a DataFrame or Series, we can use the **isna()** method that returns a boolean array *of the same size as the input DataFrame*.

In [3]:
pet_counts.isna()

Unnamed: 0,name,pets
0,False,False
1,False,True
2,False,False
3,True,False


Its opposite is the **notna()** method.

In [4]:
pet_counts.notna()

Unnamed: 0,name,pets
0,True,True
1,True,False
2,True,True
3,False,True


By passing in a dictionary to [**.fillna()**](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html), you can specify a default value for missing values in each column in a DataFrame. Any columns without a defined missing value will be ignored if a dictionary like this is passed on.

In [5]:
replacements = {"pets":0, "name":"UNKNOWN"} # In column name: replacement value form
pet_counts.fillna(replacements)

Unnamed: 0,name,pets
0,Jose,1.0
1,David,0.0
2,Rose,3.0
3,UNKNOWN,5.0


But what if you simply want to remove problematic columns or rows from your dataset?

The [**DataFrame.dropna()**](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dropna.html#pandas.DataFrame.dropna) method by default removes all rows with one or more missing values.

In [6]:
pet_counts.dropna() # By default this drops all rows with one or missing values

Unnamed: 0,name,pets
0,Jose,1.0
2,Rose,3.0


If we use the `subset=` argument, we can pass in a list of columns that determine what rows are dropped. For example, if we pass in a list containing `name` but not `"pets"`, then rows with nulls in the `name` column will be dropped, but nulls in the `pets` column will be ignored.

In [7]:
pet_counts.dropna(subset=["name"]) # Takes in a list of column names

Unnamed: 0,name,pets
0,Jose,1.0
1,David,
2,Rose,3.0


To finalize our changes to the `pet_counts` frame, we must assign the return value of pet_counts to pet_counts as follows. Be very cautious when dropping rows and columns, and assign to a new variable if you're not sure.

In [8]:
pet_counts = pet_counts.dropna() # Takes in a list of column names
pet_counts

Unnamed: 0,name,pets
0,Jose,1.0
2,Rose,3.0


## Cleaning and Writing DataFrames to Disk

We will finish with the Index of Asian American and Pacific Artists dataset today. Load it in using `DataFrame.read_csv()`.

In [9]:
moma = pd.read_csv("IAAPI_raw_2023-10-05.csv", sep="|")
moma

Unnamed: 0,Match point,ULAN ID,Wikidata ID,VIAF ID,LC ID,Name,Label,Birth date,Death date,Ancestry/Heritage,Indexes,Description,Watsonline,Watsonline URL,Met Collection,Met Collection URL,Wikipedia,Wikipedia URL,Art Asia America,AAA URL
0,AAPI-0001,500487777.0,Q466654,79468708.0,n86857749,"Abad, Pacita",Pacita Abad,1946.0,2004.0,Filipino/a/x,,painter,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Pacita_Abad,,
1,AAPI-0970,,Q47157454,102816611.0,no2007063757,"Abbas, Hamra",Hamra Abbas,1976.0,,Pakistani,,sculptor;painter;installation artist,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,
2,AAPI-0002,500116914.0,Q7426381,96547973.0,no99018101,"Abe, Satoru",Satoru Abe,1926.0,,Japanese;Hawaiian (Kamaʻāina),,painter;sculptor,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Satoru_Abe,,
3,AAPI-1031,,Q23881684,307448466.0,no2013110010,"Abichandani, Jaishri",Jaishri Abichandani,1969.0,,Indian,,interdisciplinary artist,Watsonline,https://library.metmuseum.org/search/?searchty...,,,,,,
4,AAPI-0003,,Q19867429,,,"Acebo Davis, Terry",Terry Acebo Davis,1953.0,,Filipino/a/x,,installation artist;mixed-media artist;printmaker,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Terry_Acebo_Davis,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
925,AAPI-0890,500524609.0,Q21607972,307424125.0,n2014002953,"Zheng, Chongbin",Chongbin Zheng,1961.0,,Chinese,,installation artist;painter;time-based media a...,Watsonline,https://library.metmuseum.org/search/?searchty...,Met Collection,https://www.metmuseum.org/art/collection/searc...,,,,
926,AAPI-0735,,Q8070726,,,"Zheng, Lianjie",Lianjie Zheng,1962.0,,Chinese,,installation artist;painter;performance artist...,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Zheng_Lianjie,artasiamerica,http://artasiamerica.org/artist/detail/57
927,AAPI-0736,,Q120867919,,,"Zheng, Shengtian",Shengtian Zheng,1938.0,,Chinese,,painter;ink artist/calligrapher,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,
928,AAPI-0889,,Q120868879,,,"Zhong, Yueying",Yueying Zhong,1960.0,,Chinese,,painter;ink artist/calligrapher,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,


Let's make a quick adjustment so that Pandas displays all the columns in this DataFrame.

In [10]:
pd.set_option('display.max_columns', None)

Now that we know more about missing values, let's take a closer look at the `moma` DataFrame. To get a sense of the number of missing values in a DataFrame at a glance, use [**DataFrame.info()**](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.info.html#pandas.DataFrame.info).

In [11]:
moma.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 930 entries, 0 to 929
Data columns (total 20 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Match point         930 non-null    object 
 1   ULAN ID             308 non-null    float64
 2   Wikidata ID         930 non-null    object 
 3   VIAF ID             418 non-null    float64
 4   LC ID               341 non-null    object 
 5   Name                930 non-null    object 
 6   Label               918 non-null    object 
 7   Birth date          820 non-null    float64
 8   Death date          248 non-null    float64
 9   Ancestry/Heritage   903 non-null    object 
 10  Indexes             31 non-null     object 
 11  Description         896 non-null    object 
 12  Watsonline          930 non-null    object 
 13  Watsonline URL      930 non-null    object 
 14  Met Collection      83 non-null     object 
 15  Met Collection URL  83 non-null     object 
 16  Wikipedi

This also contains useful information about **dtypes** and **memory usage**, which is worth tracking on larger DataFrames.

We can use **.isna()** in combination with boolean indexing to identify problematic rows in our dataset. For example, every row has a non-null `Name`, but 12 do not have a label. We can use **.isna()** as follows to identify them.

In [12]:
moma[moma["Label"].isna()]

Unnamed: 0,Match point,ULAN ID,Wikidata ID,VIAF ID,LC ID,Name,Label,Birth date,Death date,Ancestry/Heritage,Indexes,Description,Watsonline,Watsonline URL,Met Collection,Met Collection URL,Wikipedia,Wikipedia URL,Art Asia America,AAA URL
634,AAPI-0132,,Q28922612,,,"Otake, Eiko (Eiko & Koma)",,1952.0,,Japanese,,performance artist;multidisciplinary artist,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Eiko_%26_Koma,,
635,AAPI-1155,,Q54573836,,,"Otake, Takashi Koma (Eiko & Koma)",,1948.0,,Japanese,,performance artist;multidisciplinary artist,Watsonline,https://library.metmuseum.org/search/?searchty...,,,Wikipedia,https://en.wikipedia.org/wiki/Eiko_%26_Koma,,
722,AAPI-1040,500778584.0,Q76500450,,,"Sin, Wai Kin",,1991.0,,Chinese,,multimedia artist;performance artist,Watsonline,https://library.metmuseum.org/search/?searchty...,,,Wikipedia,https://en.wikipedia.org/wiki/Victoria_Sin,,
730,AAPI-1171,,Q122923630,,,sTo Len,,1978.0,,Vietnamese,,printmaker;painter;installation artist;perform...,Watsonline,https://library.metmuseum.org/search/?searchty...,,,,,,
756,AAPI-1172,,Q11668731,36016359.0,n88213771,"Takai, Teiji",,1911.0,1986.0,Japanese,,painter,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,
763,AAPI-0999,500330938.0,Q94519537,106303831.0,no2013022338,"Tam, Ho",,1962.0,,Hongkonger,,book/paper/zine artist;installation artist;mul...,Watsonline,https://library.metmuseum.org/search/?searchty...,,,,,,
766,AAPI-1010,500258803.0,Q2634269,13478511.0,n00011162,"Tam, Vivienne",,1957.0,,Chinese;Hongkonger,,designer,Watsonline,https://library.metmuseum.org/search/?searchty...,Met Collection,https://www.metmuseum.org/art/collection/searc...,Wikipedia,https://en.wikipedia.org/wiki/Vivienne_Tam,,
775,AAPI-1012,500061593.0,Q8052423,,,"Teng, Yeohlee",,1951.0,,Chinese;Malaysian,,designer,Watsonline,https://library.metmuseum.org/search/?searchty...,Met Collection,https://www.metmuseum.org/art/collection/searc...,Wikipedia,https://en.wikipedia.org/wiki/Yeohlee_Teng,,
780,AAPI-1034,,Q7777796,,,"Thein, Chaw Ei",,1969.0,,Burmese,,installation artist;multidisciplinary artist;p...,Watsonline,https://library.metmuseum.org/search~S1/X?SEAR...,,,,,,
819,AAPI-1000,500175965.0,Q21479888,9772795.0,nr2001021924,"Usui, Bumpei",,1889.0,1994.0,Japanese,,painter,Watsonline,https://library.metmuseum.org/search/?searchty...,Met Collection,https://www.metmuseum.org/art/collection/searc...,,,,


**Q**: What command could I run to drop all the rows without a `"Label"` from this DataFrame?

In [13]:
moma.dropna(subset=["Label"]) # this doesn't assign it back to the original frame

Unnamed: 0,Match point,ULAN ID,Wikidata ID,VIAF ID,LC ID,Name,Label,Birth date,Death date,Ancestry/Heritage,Indexes,Description,Watsonline,Watsonline URL,Met Collection,Met Collection URL,Wikipedia,Wikipedia URL,Art Asia America,AAA URL
0,AAPI-0001,500487777.0,Q466654,79468708.0,n86857749,"Abad, Pacita",Pacita Abad,1946.0,2004.0,Filipino/a/x,,painter,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Pacita_Abad,,
1,AAPI-0970,,Q47157454,102816611.0,no2007063757,"Abbas, Hamra",Hamra Abbas,1976.0,,Pakistani,,sculptor;painter;installation artist,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,
2,AAPI-0002,500116914.0,Q7426381,96547973.0,no99018101,"Abe, Satoru",Satoru Abe,1926.0,,Japanese;Hawaiian (Kamaʻāina),,painter;sculptor,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Satoru_Abe,,
3,AAPI-1031,,Q23881684,307448466.0,no2013110010,"Abichandani, Jaishri",Jaishri Abichandani,1969.0,,Indian,,interdisciplinary artist,Watsonline,https://library.metmuseum.org/search/?searchty...,,,,,,
4,AAPI-0003,,Q19867429,,,"Acebo Davis, Terry",Terry Acebo Davis,1953.0,,Filipino/a/x,,installation artist;mixed-media artist;printmaker,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Terry_Acebo_Davis,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
925,AAPI-0890,500524609.0,Q21607972,307424125.0,n2014002953,"Zheng, Chongbin",Chongbin Zheng,1961.0,,Chinese,,installation artist;painter;time-based media a...,Watsonline,https://library.metmuseum.org/search/?searchty...,Met Collection,https://www.metmuseum.org/art/collection/searc...,,,,
926,AAPI-0735,,Q8070726,,,"Zheng, Lianjie",Lianjie Zheng,1962.0,,Chinese,,installation artist;painter;performance artist...,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Zheng_Lianjie,artasiamerica,http://artasiamerica.org/artist/detail/57
927,AAPI-0736,,Q120867919,,,"Zheng, Shengtian",Shengtian Zheng,1938.0,,Chinese,,painter;ink artist/calligrapher,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,
928,AAPI-0889,,Q120868879,,,"Zhong, Yueying",Yueying Zhong,1960.0,,Chinese,,painter;ink artist/calligrapher,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,


Look closely at the number of rows at the bottom: there are twelve fewer if we drop rows without a `"Label"`.

Let's say you want to amend some of this data by providing missing descriptions for artists without them. The easiest place to start would be to identify artists with *Wikipedia* pages, as those will likely have enough information to supply the missing description.

**Q**: How could we generate a boolean index and **.notna()** and **.isna()** to find artists *with* Wikipedia pages but *without* descriptions?

In [14]:
(moma["Description"].isna()) & (moma["Wikipedia URL"].notna())

0      False
1      False
2      False
3      False
4      False
       ...  
925    False
926    False
927    False
928    False
929    False
Length: 930, dtype: bool

Now, let's apply this selection and restrict it to the `Name`, `Description`, and `Wikipedia URL` columns.

In [15]:
moma[(moma["Description"].isna()) & (moma["Wikipedia URL"].notna())].loc[:, ["Name", "Description", "Wikipedia URL"]]

Unnamed: 0,Name,Description,Wikipedia URL
60,"Bonk, Keiko",,https://en.wikipedia.org/wiki/Keiko_Bonk
112,"Chiang, Fay",,https://en.wikipedia.org/wiki/Fay_Chiang
113,"Chiang, Janice",,https://en.wikipedia.org/wiki/Janice_Chiang
115,"Chin, Frank",,https://en.wikipedia.org/wiki/Frank_Chin
630,"Osorio, Jamaica",,https://en.wikipedia.org/wiki/Jamaica_Osorio


## Renaming Columns and Rows 
Naming variables with spaces (like `"Wikipedia URL"`) is not best practice. Let's go about fixing that.

We can use [**DataFrame.rename()**](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rename.html) to pass in key-value pairs that map old names to new ones. We can use rename on either row names or column names. 

The most straightfoward to use **.rename()** is pass in a dictionary of `{'old_name':'new_name','old_name1':'new_name1'...}` mappings as an keyword argument to either `index=` or `columns=` respectively.

For example, we could rename the `"Wikipedia URL"` and `"Birth date"` columns like this.

In [16]:
new_names = {"Wikipedia URL":"Wikipedia_URL", "Birth date":"Birth__date"} # Create a dictionary
moma.rename(columns=new_names).head(5)

Unnamed: 0,Match point,ULAN ID,Wikidata ID,VIAF ID,LC ID,Name,Label,Birth__date,Death date,Ancestry/Heritage,Indexes,Description,Watsonline,Watsonline URL,Met Collection,Met Collection URL,Wikipedia,Wikipedia_URL,Art Asia America,AAA URL
0,AAPI-0001,500487777.0,Q466654,79468708.0,n86857749,"Abad, Pacita",Pacita Abad,1946.0,2004.0,Filipino/a/x,,painter,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Pacita_Abad,,
1,AAPI-0970,,Q47157454,102816611.0,no2007063757,"Abbas, Hamra",Hamra Abbas,1976.0,,Pakistani,,sculptor;painter;installation artist,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,
2,AAPI-0002,500116914.0,Q7426381,96547973.0,no99018101,"Abe, Satoru",Satoru Abe,1926.0,,Japanese;Hawaiian (Kamaʻāina),,painter;sculptor,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Satoru_Abe,,
3,AAPI-1031,,Q23881684,307448466.0,no2013110010,"Abichandani, Jaishri",Jaishri Abichandani,1969.0,,Indian,,interdisciplinary artist,Watsonline,https://library.metmuseum.org/search/?searchty...,,,,,,
4,AAPI-0003,,Q19867429,,,"Acebo Davis, Terry",Terry Acebo Davis,1953.0,,Filipino/a/x,,installation artist;mixed-media artist;printmaker,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Terry_Acebo_Davis,,


**.rename()** will ignore columns or indices that do not exist in your DataFrame. If the old_name of the `old:new` pair doesn't exist, that change isn't applied and that renaming operation will be ignored.

In [17]:
moma.rename(index=new_names).head(5) # Rename rows when we mean to rename columns

Unnamed: 0,Match point,ULAN ID,Wikidata ID,VIAF ID,LC ID,Name,Label,Birth date,Death date,Ancestry/Heritage,Indexes,Description,Watsonline,Watsonline URL,Met Collection,Met Collection URL,Wikipedia,Wikipedia URL,Art Asia America,AAA URL
0,AAPI-0001,500487777.0,Q466654,79468708.0,n86857749,"Abad, Pacita",Pacita Abad,1946.0,2004.0,Filipino/a/x,,painter,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Pacita_Abad,,
1,AAPI-0970,,Q47157454,102816611.0,no2007063757,"Abbas, Hamra",Hamra Abbas,1976.0,,Pakistani,,sculptor;painter;installation artist,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,
2,AAPI-0002,500116914.0,Q7426381,96547973.0,no99018101,"Abe, Satoru",Satoru Abe,1926.0,,Japanese;Hawaiian (Kamaʻāina),,painter;sculptor,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Satoru_Abe,,
3,AAPI-1031,,Q23881684,307448466.0,no2013110010,"Abichandani, Jaishri",Jaishri Abichandani,1969.0,,Indian,,interdisciplinary artist,Watsonline,https://library.metmuseum.org/search/?searchty...,,,,,,
4,AAPI-0003,,Q19867429,,,"Acebo Davis, Terry",Terry Acebo Davis,1953.0,,Filipino/a/x,,installation artist;mixed-media artist;printmaker,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Terry_Acebo_Davis,,


Keep in mind that this mistake continues silently (without throwing an error) when debugging code.

### Batch Renaming With String Functions
There are several column names in this DataFrame with spaces in them. Do we really have to type out them all?

Pandas has special string-processing functions for Series and DataFrames inspired by Python's built-in [string operations](https://docs.python.org/3/library/stdtypes.html#string-methods) methods.

Let's review a few string methods: `str.lower()`, `str.isnumeric()`, `str.replace()`.

`str.lower()` returns a lowercase version of the string.

In [19]:
shouting = "ALL CAPS"
shouting.lower()

'all caps'

`str.isnumeric()` tests whether or not a string is a number. Any alphabetic characters will cause this to return `False`.

In [20]:
shouting.isnumeric()

False

In [21]:
lucky = "12"
lucky.isnumeric()

True

`str.replace(old, new)` returns a copy of a string with all occurrences of substring `old` replaced by `new`. 


A *substring* is 1 or more consecutive characters inside an existing string.
`"BAG"` is a substring of `"BAGEL"` but `"NEW YORK"` is not.

In [22]:
less_shouting = str.replace(shouting, "ALL", "some")
less_shouting

'some CAPS'

The `str.replace()` function is especially convenient for removing unwanted characters.

In [24]:
has_spaces = "This has spaces"
has_spaces.replace(" ", "_") # Replaces spaces with _

'This_has_spaces'

Wouldn't it be convenient if we could apply these to all of our columns at once?

That's where [`Series.str.replace()`](https://pandas.pydata.org/docs/reference/api/pandas.Series.str.replace.html#pandas.Series.str.replace) comes in.

We can access any DataFrame's column names using the `.columns` attribute.

In [25]:
moma.columns

Index(['Match point', 'ULAN ID', 'Wikidata ID', 'VIAF ID', 'LC ID', 'Name',
       'Label', 'Birth date', 'Death date', 'Ancestry/Heritage', 'Indexes',
       'Description', 'Watsonline', 'Watsonline URL', 'Met Collection',
       'Met Collection URL', 'Wikipedia', 'Wikipedia URL', 'Art Asia America',
       'AAA URL'],
      dtype='object')

Let's use the Pandas version of the `str.replace()` method we already know to replace spaces with underscores.

In [26]:
moma.columns.str.replace(" ", "_")

Index(['Match_point', 'ULAN_ID', 'Wikidata_ID', 'VIAF_ID', 'LC_ID', 'Name',
       'Label', 'Birth_date', 'Death_date', 'Ancestry/Heritage', 'Indexes',
       'Description', 'Watsonline', 'Watsonline_URL', 'Met_Collection',
       'Met_Collection_URL', 'Wikipedia', 'Wikipedia_URL', 'Art_Asia_America',
       'AAA_URL'],
      dtype='object')

This returns a **copy** of the column names, so we'll have to assign it back to the DataFrame.

In [27]:
moma.columns = moma.columns.str.replace(" ", "_")
moma.head(5)

Unnamed: 0,Match_point,ULAN_ID,Wikidata_ID,VIAF_ID,LC_ID,Name,Label,Birth_date,Death_date,Ancestry/Heritage,Indexes,Description,Watsonline,Watsonline_URL,Met_Collection,Met_Collection_URL,Wikipedia,Wikipedia_URL,Art_Asia_America,AAA_URL
0,AAPI-0001,500487777.0,Q466654,79468708.0,n86857749,"Abad, Pacita",Pacita Abad,1946.0,2004.0,Filipino/a/x,,painter,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Pacita_Abad,,
1,AAPI-0970,,Q47157454,102816611.0,no2007063757,"Abbas, Hamra",Hamra Abbas,1976.0,,Pakistani,,sculptor;painter;installation artist,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,
2,AAPI-0002,500116914.0,Q7426381,96547973.0,no99018101,"Abe, Satoru",Satoru Abe,1926.0,,Japanese;Hawaiian (Kamaʻāina),,painter;sculptor,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Satoru_Abe,,
3,AAPI-1031,,Q23881684,307448466.0,no2013110010,"Abichandani, Jaishri",Jaishri Abichandani,1969.0,,Indian,,interdisciplinary artist,Watsonline,https://library.metmuseum.org/search/?searchty...,,,,,,
4,AAPI-0003,,Q19867429,,,"Acebo Davis, Terry",Terry Acebo Davis,1953.0,,Filipino/a/x,,installation artist;mixed-media artist;printmaker,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Terry_Acebo_Davis,,


As a sneak peak of passing functions as arguments, I want to mention you can pass a function like `str.lower()` to rename.

In [28]:
moma.rename(mapper=str.lower, axis="columns").head(2)

Unnamed: 0,match_point,ulan_id,wikidata_id,viaf_id,lc_id,name,label,birth_date,death_date,ancestry/heritage,indexes,description,watsonline,watsonline_url,met_collection,met_collection_url,wikipedia,wikipedia_url,art_asia_america,aaa_url
0,AAPI-0001,500487777.0,Q466654,79468708.0,n86857749,"Abad, Pacita",Pacita Abad,1946.0,2004.0,Filipino/a/x,,painter,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Pacita_Abad,,
1,AAPI-0970,,Q47157454,102816611.0,no2007063757,"Abbas, Hamra",Hamra Abbas,1976.0,,Pakistani,,sculptor;painter;installation artist,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,


## String Manipulation with Series

We can apply these methods on *any* Series of strings, not just column or row names.

In [29]:
moma["Name"].str.lower()

0              abad, pacita
1              abbas, hamra
2               abe, satoru
3      abichandani, jaishri
4        acebo davis, terry
               ...         
925         zheng, chongbin
926          zheng, lianjie
927        zheng, shengtian
928          zhong, yueying
929          tamotzu, chuzo
Name: Name, Length: 930, dtype: object

These string methods will *not* work on rows or columns with numeric values. (A float or integer can't be lowercase or uppercase.)

In [30]:
moma["Death_date"].str.lower()

AttributeError: Can only use .str accessor with string values!

**Q**: How could you make the column names in the `moma` DataFrame uppercase? **HINT**: Don't forget to change the DataFrame!
Alter the column names, then display the first three rows of `moma`.

In [31]:
moma.columns = moma.columns.str.upper()
moma.head(3)

Unnamed: 0,MATCH_POINT,ULAN_ID,WIKIDATA_ID,VIAF_ID,LC_ID,NAME,LABEL,BIRTH_DATE,DEATH_DATE,ANCESTRY/HERITAGE,INDEXES,DESCRIPTION,WATSONLINE,WATSONLINE_URL,MET_COLLECTION,MET_COLLECTION_URL,WIKIPEDIA,WIKIPEDIA_URL,ART_ASIA_AMERICA,AAA_URL
0,AAPI-0001,500487777.0,Q466654,79468708.0,n86857749,"Abad, Pacita",Pacita Abad,1946.0,2004.0,Filipino/a/x,,painter,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Pacita_Abad,,
1,AAPI-0970,,Q47157454,102816611.0,no2007063757,"Abbas, Hamra",Hamra Abbas,1976.0,,Pakistani,,sculptor;painter;installation artist,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,,,,
2,AAPI-0002,500116914.0,Q7426381,96547973.0,no99018101,"Abe, Satoru",Satoru Abe,1926.0,,Japanese;Hawaiian (Kamaʻāina),,painter;sculptor,Watsonline,https://library.metmuseum.org/search~S1/?searc...,,,Wikipedia,https://en.wikipedia.org/wiki/Satoru_Abe,,


The full list of **[Pandas string handling methods](https://pandas.pydata.org/docs/reference/series.html#string-handling)** is available on the official documentation.

## Writing Out DataFrames
To output a `DataFrame` to disk, we can use one of several fuctions including : `.to_csv()`, `to_excel()`, and `to_sql()`. Let's practice with [`.to_csv()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html).

Let's practice by writing out the cleaned version of the `moma` DataFrame.

Create a new folder named `Output` in your project folder using the File Browser tab.

In [40]:
moma.to_csv("Output/moma_cleaned.csv", index=False) 

This cell will not return any output. Instead, check the contents of the `Output` folder and press the refresh symbol if necessary.

## Reading Data into Pandas from the Internet

Let's talk about reading in files from the internet. Our next dataset the Programming Language Database, is hosted online, and we will be reading it in from a URL rather than downloading to our local filesystem.

The Programming Language Database (PLDB) is a comprehensive database of programming languages and their common features with a popularity ranking algorithm.

You can find more information about this data in the `resources.md` file.

In [41]:
pldb = pd.read_csv("https://pldb.io/pldb.csv")
pldb

  pldb = pd.read_csv("https://pldb.io/pldb.csv")


Unnamed: 0,id,name,appeared,creators,maintainers,measurements,tags,rijuRepl_website,website,spec,foundationScore,blog,releaseNotes,download,latestVersion,exampleCount,rank,lastActivity,writtenInCount,pldbScore,meetup_memberCount,meetup_groupCount,subreddit_memberCount,twitter_followers,conferences,zulip,githubRepo_created,githubRepo_updated,githubRepo_firstCommit,githubRepo_subscribers,githubRepo_forks,githubRepo_stars,githubRepo_issues,screenshot,photo,linguistGrammarRepo_commitCount,linguistGrammarRepo_firstCommit,linguistGrammarRepo_lastCommit,linguistGrammarRepo_sampleCount,linguistGrammarRepo_committerCount,redditDiscussion,roadmap,webRepl,wikipedia_related,wikipedia_appeared,wikipedia_summary,wikipedia_created,wikipedia_dailyPageViews,wikipedia_backlinksCount,wikipedia_revisionCount,wikipedia_pageId,domainName_registered,domainName_awisRank,githubBigQuery_users,githubBigQuery_repos,githubLanguage_filenames,githubLanguage_repos,githubLanguage_wrap,githubLanguage_trendingProjectsCount,githubLanguage_trendingProjects,githubLanguage_group,githubLanguage_aliases,githubLanguage_interpreters,githubLanguage_aceMode,githubLanguage_codemirrorMode,githubLanguage_codemirrorMimeType,githubLanguage_tmScope,githubLanguage_type,leachim6_filepath,projectEuler_memberCount,pygmentsHighlighter_filename,tiobe_currentRank,filenames,maintainerOrganization,forLanguages,dblp,dblp_hits,dblp_publications,mainRepo,primaryTag,standsFor,aka,oldName,equation,packageInstallCount,packageCount,packageAuthors,hoplId,isLanguage,inboundLinksCount,inboundLinks,isFinished,nativeLanguage,repoStats_firstCommit,repoStats_newestCommit,repoStats_commits,repoStats_committers,repoStats_files,repoStats_mb,description,githubRepo_description,paper,rijuRepl_description,lab,rijuRepl_fileExtensions,wikipedia_fileExtensions,githubLanguage_fileExtensions,leachim6_fileExtensions,pygmentsHighlighter_fileExtensions,fileExtensions,interviews,languageServerProtocolProject_writtenIn,writtenIn,compilesTo,leetSheets,isPublicDomain,isOpenSource,exercism,numberOfUsersEstimate,numberOfJobsEstimate,expandedMeasurements,related,runsOnVm,influencedBy,successorOf,dialectOf,subsetOf,implementationOf,renamedTo,supersetOf,extensionOf,forkOf,inputLanguages,protocols,irc,numberOfCreators,isSelfHosted,latestMajorVersion,usesSemanticVersioning,tryItOnline,demoVideo,clocExtensions,gdbSupport,visualParadigm,docs,forums,devDocs,ebook,emailList,esolang,eventsPageUrl,faq,wordRank,fileType,annualReportsUrl,antlr,replit,rosettaCode,codeMirror,monaco,quineRelay,packageRepository,ubuntuPackage,centralPackageRepositoryCount,repoStats,proposals,country,funFact,projectEuler,reference,subreddit,twitter,tiktoks,discord,mastodon,discourse,instagram,facebook,youtubes,linguistGrammarRepo_example,rijuRepl,rijuRepl_example,wikipedia_example,compilerExplorer_example,leachim6_example,pygmentsHighlighter,example,helloWorldCollection,theLanguage,leachim6,gource,languageServerProtocolProject,compilerExplorer,githubCopilotOptimized,keywords,meetup,gitRepo,specRepo,githubRepo,gitlabRepo,sourcehutRepo,rijuRepl_gitRepo,lineCommentToken,multiLineCommentTokens,printToken,stringToken,assignmentToken,booleanTokens,includeToken,canDoShebang,canReadCommandLineArgs,canUseQuestionMarksAsPartOfIdentifier,canWriteToDisk,hasAbstractTypes,hasAccessModifiers,hasAlgebraicTypes,hasAnonymousFunctions,hasArraySlicingSyntax,hasAssertStatements,hasAssignment,hasAsyncAwait,hasBinaryNumbers,hasBinaryOperators,hasBitWiseOperators,hasBlobs,hasBooleans,hasBoundedCheckedArrays,hasBreak,hasBuiltInRegex,hasCaseInsensitiveIdentifiers,hasCharacters,hasClasses,hasClobs,hasComments,hasConditionals,hasConstants,hasConstructors,hasContinue,hasDecimals,hasDecorators,hasDefaultParameters,hasDependentTypes,hasDestructuring,hasDirectives,hasDisposeBlocks,hasDocComments,hasDuckTyping,hasDynamicProperties,hasDynamicSizedArrays,hasDynamicTyping,hasEnums,hasEscapeCharacters,hasExceptions,hasExplicitTypeCasting,hasExports,hasExpressions,hasFirstClassFunctions,hasFixedPoint,hasFloats,hasFnArguments,hasForEachLoops,hasForLoops,hasFunctionComposition,hasFunctionOverloading,hasFunctions,hasGarbageCollection,hasGradualTypes,hasGenerators,hasGenerics,hasGlobalScope,hasGotos,hasHereDocs,hasHexadecimals,hasHomoiconicity,hasIds,hasIfElses,hasIfs,hasImplicitArguments,hasImplicitTypeConversions,hasImports,hasIncrementAndDecrementOperators,hasInfixNotation,hasInheritance,hasIntegers,hasInterfaces,hasIterators,hasLabels,hasLazyEvaluation,hasLineComments,hasLists,hasStandardLibrary,hasExplicitStandardLibrary,hasMacros,hasMagicGettersAndSetters,hasManualMemoryManagement,hasMapFunctions,hasMaps,hasMemberVariables,hasMessagePassing,hasMethodChaining,hasMethodOverloading,hasMethods,hasMixins,hasModules,hasMonads,hasMultiLineComments,hasMultilineStrings,hasMultipleDispatch,hasMultipleInheritance,hasNamespaces,hasNull,hasOctals,hasOperatorOverloading,hasOperators,hasPairs,hasPartialApplication,hasPatternMatching,hasPipes,hasPointers,hasPolymorphism,hasPostfixNotation,hasPrefixNotation,hasPrintDebugging,hasProcessorRegisters,hasRangeOperators,hasReferences,hasRefinementTypes,hasRegularExpressionsSyntaxSugar,hasRequiredMainFunction,hasReservedWords,hasRunTimeGuards,hasSExpressions,hasScientificNotation,hasSelfOrThisWord,hasSemanticIndentation,hasSets,hasSingleDispatch,hasSingleTypeArrays,hasSourceMaps,hasStatementTerminatorCharacter,hasStatements,hasStaticMethods,hasStaticTyping,hasStreams,hasStringConcatOperator,hasStrings,hasStructs,hasSwitch,hasSymbolTables,hasSymbols,hasTemplates,hasTernaryOperators,hasThreads,hasTimestamps,hasTraits,hasTriples,hasTryCatch,hasTypeAliases,hasTypeAnnotations,hasTypeInference,hasTypeParameters,hasTypedHoles,hasUnaryOperators,hasUnicodeIdentifiers,hasUnionTypes,hasUnitsOfMeasure,hasUserDefinedOperators,hasValueReturnedFunctions,hasVariableSubstitutionSyntax,hasVariadicFunctions,hasVirtualFunctions,hasVoidFunctions,hasWhileLoops,hasZeroBasedNumbering,hasZippers,isCaseSensitive,letterFirstIdentifiers,mergesWhitespace,supportsBreakpoints,jupyterKernel,wikipedia,bookCount,paperCount,hopl,pypl,tiobe,domainName,githubBigQuery,linguistGrammarRepo,hackerNewsDiscussions,isbndb,githubLanguage,indeedJobs,linkedInSkill,stackOverflowSurvey,semanticScholar,goodreads
0,javascript,JavaScript,1995,Brendan Eich,,144,pl,,,https://ecma-international.org/publications-an...,524,,,,es14,5,1,,,25357,3151948.0,5270.0,,,,,,,,,,,,,,1133.0,2013.0,2018.0,38.0,103.0,,,https://playcode.io/javascript/,java lua scheme perl self c python awk hyperta...,1995.0,"JavaScript (), often abbreviated as JS, is a h...",2001.0,4264.0,8982.0,6131.0,9845.0,,,566345.0,1099879.0,Jakefile,16046489.0,,26.0,author name avatar url language languageColor ...,,js or node,chakra d8 gjs js node nodejs qjs rhino v8 v8-s...,javascript,javascript,text/javascript,source.js,programming,j/JavaScript.js,,javascript.py,6.0,,,,,,,,pl,,es5,,,,,,2133.0,True,576,11ty abs ace ait al alumina amber apache-hbase...,False,,,,,,,,,,,,Netscape,,,js _js bones cjs es es6 frag gs jake javascrip...,js,js jsm mjs cjs,,,,,,https://cheatsheets.zip/javascript,,True,https://exercism.org/tracks/javascript,5962666,63993,240,,,java self scheme,,,,,,,,,,,,1.0,,14.0,True,,,_js bones cjs es6 jake jakefile js jsb jscad j...,,,https://developer.mozilla.org/en-US/docs/Web/J...,,,https://eloquentjavascript.net/,,,,,3002.0,text,https://stateofjs.com/en-us/,https://github.com/antlr/grammars-v4/tree/mast...,https://repl.it/languages/javascript,http://www.rosettacode.org/wiki/Category:JavaS...,javascript,javascript,JavaScript,http://npmjs.org,nodejs,,,https://github.com/tc39/proposals,United States,The name Java in JavaScript was pure marketing...,ECMAScript,https://www.w3schools.com/js/js_reserved.asp,https://reddit.com/r/javascript,,,,,,,,,"alert(""dude!"")",https://riju.codes/javascript,"console.log(""Hello, world!"");",var minstake = 0.00000100; // valor base ...,,"console.log(""Hello World"");",JavaScript,,"// Hello world in JavaScript console.log(""Hell...",,JavaScript,,,,True,abstract arguments await boolean break byte ca...,https://www.meetup.com/topics/javascript,,,,,,,//,/* */,console.log,`,=,true false,,,,,,False,False,,True,,,True,True,True,True,True,,True,,,,False,,True,,True,True,True,True,,,,True,,True,True,,,,True,,True,False,,True,,True,True,True,,True,,,,True,False,True,True,,True,,,,,True,,,,,,True,True,True,True,True,True,,,True,,True,True,True,,False,True,,True,,,,True,,True,,,,True,True,False,False,,,True,False,True,,True,,,False,True,,,True,False,,True,,True,,,,False,True,,False,True,True,,True,,True,,,,,True,,True,,True,,True,,,,,,,,,,,,,,,,,False,,,,True,True,,True,True,True,True,https://github.com/n-riesco/ijavascript,https://en.wikipedia.org/wiki/JavaScript,351,48,https://hopl.info/showlanguage.prx?exp=2133,JavaScript,JavaScript,,JavaScript,https://github.com/atom/language-javascript,,year|publisher|title|authors|isbn13\n2014|Wile...,JavaScript,javascript developer,javascript,,year|title|doi|citations|influentialCitations|...,title|year|author|goodreadsId|rating|ratings|r...
1,c,C,1972,Dennis Ritchie,,99,pl,,,https://www.iso-9899.info/wiki/The_Standard,404,,,,C17,5,2,,,25335,69338.0,204.0,,,,,,,,,,,,,,359.0,2005.0,2018.0,57.0,23.0,,,,cyclone unified-parallel-c split-c cilk b bcpl...,2011.0,"C (, as in the letter c) is a general-purpose,...",2001.0,6268.0,10585.0,7316.0,6021.0,,,177962.0,292876.0,,2160271.0,,26.0,author name avatar url language languageColor ...,,,tcc,c_cpp,clike,text/x-csrc,source.c,programming,c/C.c,,c_cpp.py,2.0,,,,,,,,pl,,,,,,,,577.0,True,443,acorn-lang ad-hoc adamant adept alumina alumin...,False,,,,,,,,,,,,Bell Labs,c h,Mono,c cats h idc,,c h idc x[bp]m,,,,,,https://courses.cs.washington.edu/courses/cse3...,,True,https://exercism.org/tracks/c,3793768,59919,141,,,,,,,,,,,,,,,1.0,,17.0,False,,,c cats ec idc pgc,True,,https://devdocs.io/c/,,,,,,,,81.0,text,,https://github.com/antlr/grammars-v4/tree/mast...,https://repl.it/languages/c,http://www.rosettacode.org/wiki/Category:C,,cpp,C,,gcc,,,,United States,"C gets credit for the // comments, starting i...",C/C++,http://www.c4learn.com/c-programming/c-keywords/,https://reddit.com/r/C_Programming,,,,,,,,,#ifndef HELLO_H #define HELLO_H void hello();...,https://riju.codes/c,"#include <stdio.h> int main() { printf(""Hel...",#include <stdio.h> int main(void) { print...,"// Type your code here, or load an example. in...","#include <stdio.h> main() { printf(""Hello...",C,,,,C,,,C,,auto break case char const continue default do...,https://www.meetup.com/topics/c,,,,,,https://github.com/llvm/llvm-project,//,/* */,printf,,=,,,,,,,,False,,,,True,True,,,,True,,True,,,,False,True,False,,True,True,True,False,,,,,,,True,,,,,,,True,,False,True,,,,False,,,,,,,,False,,,,,True,,,,,,,,,True,True,,,True,,,,,True,,True,True,True,,True,,,,,,,,,,,True,,,False,False,,,False,True,,,,,True,,,,True,,,,,False,,,,,True,,False,,,,,,,,,,,True,True,True,True,,False,True,,,,,,,,,,,,,,,,,False,True,,,True,True,,True,,,,https://github.com/brendan-rius/jupyter-c-kernel,https://en.wikipedia.org/wiki/C_(programming_l...,78,19,https://hopl.info/showlanguage.prx?exp=577,C,C,,C,https://github.com/textmate/c.tmbundle,,year|publisher|title|authors|isbn13\n2003|McGr...,C,c engineer,c,,year|title|doi|citations|influentialCitations|...,
2,python,Python,1991,Guido van Rossum,,120,pl,https://www.python.org/,https://www.python.org/,https://docs.python.org/3/reference/,393,,https://docs.python.org/3/whatsnew/,https://www.python.org/downloads/,3.13.0,3,3,,32.0,25335,1424303.0,1964.0,,,,,2017.0,2024.0,,1511.0,29561.0,61378.0,8730.0,,,,,,20.0,,,,,jython micropython stackless-python cython abc...,1991.0,Python is a widely used high-level programming...,2001.0,7204.0,6849.0,6342.0,23862.0,1995.0,,297138.0,550171.0,.gclient DEPS SConscript SConstruct Snakefile ...,9300725.0,,26.0,author name avatar url language languageColor ...,,python3 or rusthon,python python2 python3,python,python,text/x-python,source.python,programming,,,python.py,1.0,,,,,,,https://github.com/python/cpython,pl,,,,,,,,1658.0,True,411,aardvark ace adept aheui ail aith alumina ana ...,False,,1990.0,2025.0,156324.0,3360.0,5121.0,681.0,,,,,Centrum Wiskunde & Informatica,py pyi pyc pyd pyo pyw pyz,py pyc pyd pyo,py cgi fcgi gyp gypi lmi py3 pyde pyi pyp pyt ...,,py pyw jy sage sc SConstruct SConscript bzl BU...,py pyc pyd pyo,,csharp,python restructuredtext c xml toml yaml bourne...,,https://cheatsheets.zip/python,,True,https://exercism.org/tracks/python,2971459,46976,216,,,,,,,,,,,,,,,1.0,True,3.0,True,,,buck build.bazel gclient gyp gypi lmi py py3 p...,,False,https://docs.python.org/3/,,,,https://mail.python.org/mailman/listinfo,,https://www.python.org/events/,https://docs.python.org/3/faq/,4048.0,text,https://www.python.org/psf-landing/,https://github.com/antlr/grammars-v4/tree/mast...,https://repl.it/languages/python,http://www.rosettacode.org/wiki/Category:Python,python,python,Python,https://pypi.python.org/pypi,python,,,https://peps.python.org/,Netherlands,,Python,https://www.programiz.com/python-programming/k...,https://reddit.com/r/Python,,,https://www.pythondiscord.com/,,,,,,"#!/usr/bin/env python2.4 print ""Python""",https://riju.codes/python,"print(""Hello, world!"")",,def square(num): return num * num,,Python,,,,,https://www.youtube.com/watch?v=9mput42uZsQ,https://github.com/Microsoft/python-language-s...,Python,True,and as assert break class continue def del eli...,https://www.meetup.com/topics/python,,,https://github.com/python/cpython,,,https://github.com/python/cpython,#,''',print,,=,True False,,True,,,True,,,,,,,True,,True,,True,,True,,,,False,,True,,True,True,,True,,,,,,,True,True,,True,True,,,False,,,,,,,,True,,,,,,True,,,True,,,,,True,,,,,,,True,,True,True,True,,True,,,True,True,True,,False,,,,,,,,,,True,,,True,True,,True,,,True,True,,,,,,False,,,,True,,,,,False,,,,,True,,True,,True,,,,,,,,,True,,,True,,,True,True,,,,,,,,,,,,,False,,,False,,,,True,True,,True,,,,,https://en.wikipedia.org/wiki/Python_(programm...,342,52,https://hopl.info/showlanguage.prx?exp=1658,Python,Python,python.org,Python,https://github.com/tree-sitter/tree-sitter-python,,year|publisher|title|authors|isbn13\n2014|No S...,Python,python engineer,python,,year|title|doi|citations|influentialCitations|...,title|year|author|goodreadsId|rating|ratings|r...
3,java,Java,1995,James Gosling,,109,pl,,https://openjdk.org/,https://docs.oracle.com/javase/specs/,142,https://blogs.oracle.com/java/,https://openjdk.org/projects/jdk-updates/,https://www.oracle.com/java/technologies/downl...,20,6,4,,34.0,25325,1162766.0,2090.0,,,,,2018.0,2024.0,,329.0,5337.0,19037.0,311.0,,,283.0,2004.0,2018.0,9.0,21.0,,,,javascript pizza ada csharp eiffel mesa modula...,1995.0,Java is a general-purpose computer programming...,2001.0,5242.0,11543.0,7818.0,15881.0,,,216933.0,369548.0,,11529980.0,,26.0,author name avatar url language languageColor ...,,,,java,clike,text/x-java,source.java,programming,j/Java.java,,jvm.py,3.0,,,,,,,https://github.com/openjdk/jdk,pl,,OpenJDK,,,,,,2131.0,True,153,abcl-lang ace apache-hbase arrow-format avail ...,False,,2007.0,2025.0,85056.0,2003.0,68966.0,1317.0,,,,,Sun Microsystems,,,java jav,java,java,,,java,java cpp xml c html bourne-shell xsd objective...,,https://cheatsheets.zip/java,,True,https://exercism.org/tracks/java,5587175,85206,204,,,c cpp,,,,,,,,,,,,1.0,True,20.0,True,,,java,,False,,,https://openjdk.org/guide/,https://sd.blackball.lv/library/thinking_in_ja...,https://mail.openjdk.org/mailman/listinfo,,https://dev.java/community/events/,,1489.0,text,,https://github.com/antlr/grammars-v4/tree/mast...,https://repl.it/languages/java,http://www.rosettacode.org/wiki/Category:Java,,java,Java,https://mvnrepository.com/popular,openjdk-8-jdk,,,https://openjdk.org/jeps/0,United States,,Java,,https://reddit.com/r/java,https://twitter.com/java,,,,,,,,/** * Copyright (c) Rich Hickey. All rights...,https://riju.codes/java,public class Main { public static void mai...,// Hello.java (Java SE 5) import javax.swing.*...,"// Type your code here, or load an example. cl...",public class Java { public static void main(S...,Java,,// Hello World in Java class HelloWorld { s...,,Java,,https://github.com/georgewfraser/vscode-javac,Java,,abstract continue for new switch assert defaul...,https://www.meetup.com/topics/java,,,https://github.com/openjdk/jdk,,,,//,/* */,System.out.println,"""",,true false,,,,,,,True,,,,True,,,True,,,,True,,,,False,,True,,True,True,True,True,,,,,,,,,,,,,,,,True,,,,,,True,,,,,,,True,,,True,,,,True,,,,,,,True,True,,True,True,True,True,,,True,,True,,False,,,,,,,,,,,True,,True,,,,,,True,False,,,,,,False,,,,True,,,,,,,,,False,True,,False,,True,,,,,,,,,True,,True,,,,,True,,,,,,,,,,,,,,,,False,,,,True,True,,True,,,,https://github.com/SpencerPark/IJava,https://en.wikipedia.org/wiki/Java_(programmin...,401,37,https://hopl.info/showlanguage.prx?exp=2131,Java,Java,,Java,https://github.com/textmate/java.tmbundle,,year|publisher|title|authors|isbn13\n2017|Pear...,Java,java engineer,java,,year|title|doi|citations|influentialCitations|...,title|year|author|goodreadsId|rating|ratings|r...
4,cpp,C++,1985,Bjarne Stroustrup,,79,pl,,http://isocpp.org/,https://isocpp.org/std/the-standard,292,https://www.isocpp.org/blog,https://en.cppreference.com/w/cpp/language/his...,,C++20,6,5,,,25314,69338.0,204.0,,,https://cppcon.org,,,,,,,,,,,,,,49.0,,,,,ada algol-68 c clu ml simula python csharp cha...,1998.0,C++ ( pronounced cee plus plus) is a general-p...,2001.0,4307.0,10943.0,1487.0,72038.0,2012.0,,170927.0,277733.0,,2161625.0,,26.0,author name avatar url language languageColor ...,,cpp,,c_cpp,clike,text/x-c++src,source.c++,programming,c/C++.cpp,,c_cpp.py,4.0,,,,,,,,pl,,,,,,,,1202.0,True,309,ace acorn-lang apache-hbase arduino arkscript ...,False,,,,,,,,,,,,Bell Labs,C cc cpp cxx c++ h hh hpp hxx h++,,cpp c++ cc cp cxx h h++ hh hpp hxx inc inl ino...,cpp,cpp hpp c++ h++ cc hh cxx hxx C H cp CPP tpp,,,,,,,,True,https://exercism.org/tracks/cpp,4128238,61098,214,,,,,,,,,c,,,,,,1.0,,20.0,False,,,C c++ c++m cc ccm CPP cpp cppm cxx cxxm h++ in...,True,False,https://devdocs.io/cpp/,,,,https://lists.isocpp.org/mailman/listinfo.cgi,,https://isocpp.org/blog/category/events,,,text,https://isocpp.org/about/annual-reports,https://github.com/antlr/grammars-v4/tree/mast...,https://repl.it/languages/cpp,,,cpp,C++,,g++,,,,United States,,C/C++,,https://reddit.com/r/cpp,https://twitter.com/isocpp,,,,,,,,#include <cstdint> namespace Gui { },https://riju.codes/cpp,#include <iostream> int main() { std::cout ...,1 #include <iostream> 2 #include <vector> 3 #i...,"// Type your code here, or load an example. in...",#include <iostream> int main() { std::cout...,C++,,// Hello World in C++ (pre-ISO) #include <ios...,,C++,,,C++,True,#define #defined #elif #else #endif #error #if...,https://www.meetup.com/topics/c,,,,,,https://github.com/llvm/llvm-project,//,/* */,std::cout,"""",=,true false,,,,,,,True,,,,,,,,,,,,,,,,,True,,,,,True,,,,,,,,,,,,,,,,True,,,,,,,,,,,True,,,,,,,,,,,,,,,,,,,,,,True,,,,,True,True,,False,,,,,,,,,,,,,,,True,True,,,True,,,True,,,,,,,,,,,,,,,,,,,,,True,,,,,,,,,,,,,,True,,True,,,,,,,,,,,,,,,,,,True,,,,,,,,,https://github.com/QuantStack/xeus-cling,https://en.wikipedia.org/wiki/C++,128,6,https://hopl.info/showlanguage.prx?exp=1202,C++,C++,isocpp.org,C++,https://github.com/textmate/c.tmbundle,,year|publisher|title|authors|isbn13\n2011|PEAR...,C++,c++ engineer,c++,,year|title|doi|citations|influentialCitations|...,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5069,brackets-editor,brackets-editor,2012,,,4,editor,,,,0,,,,,0,5070,,,15967,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,editor,,,,,,,,,False,0,,False,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,,,,,,,,,,,,,,
5070,clion-editor,clion-editor,2015,,,4,editor,,,,0,,,,,0,5071,,,15967,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,editor,,,,,,,,,False,0,,False,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,,,,,,,,,,,,,,
5071,code-blocks-editor,code-blocks-editor,2005,,,4,editor,,,,0,,,,,0,5072,,,15967,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,editor,,,,,,,,,False,0,,False,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,,,,,,,,,,,,,,
5072,codelite-editor,codelite-editor,2006,,,4,editor,,,,0,,,,,0,5073,,,15967,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,editor,,,,,,,,,False,0,,False,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,,,,,,,,,,,,,,


This dataset is quite large! We know this dataset is ranked, so let's say we want to work with the top 500 (or first 500) programming languages and the first 20 columns.
This requires to use yesterday's `.iloc[]`.
We'll also index the DataFrame by passing in the `index_col` argument.

In [42]:
pldb = pd.read_csv("https://pldb.io/pldb.csv",index_col="id", low_memory=False).iloc[:500, :20]
pldb

Unnamed: 0_level_0,name,appeared,creators,maintainers,measurements,tags,rijuRepl_website,website,spec,foundationScore,blog,releaseNotes,download,latestVersion,exampleCount,rank,lastActivity,writtenInCount,pldbScore,meetup_memberCount
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
javascript,JavaScript,1995,Brendan Eich,,144,pl,,,https://ecma-international.org/publications-an...,524,,,,es14,5,1,,,25357,3151948.0
c,C,1972,Dennis Ritchie,,99,pl,,,https://www.iso-9899.info/wiki/The_Standard,404,,,,C17,5,2,,,25335,69338.0
python,Python,1991,Guido van Rossum,,120,pl,https://www.python.org/,https://www.python.org/,https://docs.python.org/3/reference/,393,,https://docs.python.org/3/whatsnew/,https://www.python.org/downloads/,3.13.0,3,3,,32.0,25335,1424303.0
java,Java,1995,James Gosling,,109,pl,,https://openjdk.org/,https://docs.oracle.com/javase/specs/,142,https://blogs.oracle.com/java/,https://openjdk.org/projects/jdk-updates/,https://www.oracle.com/java/technologies/downl...,20,6,4,,34.0,25325,1162766.0
cpp,C++,1985,Bjarne Stroustrup,,79,pl,,http://isocpp.org/,https://isocpp.org/std/the-standard,292,https://www.isocpp.org/blog,https://en.cppreference.com/w/cpp/language/his...,,C++20,6,5,,,25314,69338.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
dat-protocol,dat-protocol,2013,Max Ogden,,14,protocol,,https://dat.foundation/,,0,,,,14.0.3,0,496,,6.0,23322,
mariadb,MariaDB,2009,,,16,queryLanguage,,,,0,,,,,1,497,,,23317,
io,Io,2002,Steve Dekorte,,39,pl,https://iolanguage.org/,https://iolanguage.org/,,0,,,,,3,498,,,23313,
sourcepawn,SourcePawn,2014,,,24,pl,,,,0,,,,,1,499,,14.0,23312,


Because we have an index column, we can grab rows by their `id`. These ids are intuitive: a standardized form of the name of the programming language.

In [43]:
pldb.index

Index(['javascript', 'c', 'python', 'java', 'cpp', 'html', 'css', 'perl',
       'php', 'ruby',
       ...
       'xtext', 'fancy', 'curl', 'lfe', 'clips', 'dat-protocol', 'mariadb',
       'io', 'sourcepawn', 'qalb'],
      dtype='object', name='id', length=500)

### A Review of Selection

To select the row about R, we can use `.loc[]` and the index `r`.

In [44]:
pldb.loc["r"]

name                                                             R
appeared                                                      1993
creators                           Ross Ihaka and Robert Gentleman
maintainers                                                    NaN
measurements                                                    78
tags                                                  pl arrayLang
rijuRepl_website                                               NaN
website                                  https://www.r-project.org
spec                                                           NaN
foundationScore                                                 34
blog                  https://developer.r-project.org/Blog/public/
releaseNotes                      https://developer.r-project.org/
download              https://cran.r-project.org/bin/windows/base/
latestVersion                                                4.4.0
exampleCount                                                  

If we want to find the `creators` of the language `python`, we can use `.loc` to retrieve the specific row and column.

In [45]:
pldb.loc["python", "creators"]

'Guido van Rossum'

Pandas indices do *not* have to be unique. If there were two rows indexed "python", then this query would return both. However, having a unique index for each row is best practice, even if that unique index is just a number between 0 and the number of rows-1.

## Manipulating DataFrames

Sometimes we want to change values in DataFrames in a systematic way. Let's start with creation new columns as a function of **other columns**. 

Operations on Series are vectorized. That means they are applied to each individual entry in the Series in parallel. Let's look at the numeric column `"appeared`", which is the date the programming language was introduced to the public.

In [46]:
pldb["appeared"]

id
javascript      1995
c               1972
python          1991
java            1995
cpp             1985
                ... 
dat-protocol    2013
mariadb         2009
io              2002
sourcepawn      2014
qalb            2012
Name: appeared, Length: 500, dtype: int64

One way to manipulate a **Series** is to add a scalar constant.

In [47]:
pldb["appeared"] + 2000 # This is shorthand for adding 20 to each value in the Series

id
javascript      3995
c               3972
python          3991
java            3995
cpp             3985
                ... 
dat-protocol    4013
mariadb         4009
io              4002
sourcepawn      4014
qalb            4012
Name: appeared, Length: 500, dtype: int64

Multiplication by a scalar is also simple and vectorized.

In [48]:
pldb["appeared"] * 0.5 # Multiply each value by 0.5

id
javascript       997.5
c                986.0
python           995.5
java             997.5
cpp              992.5
                 ...  
dat-protocol    1006.5
mariadb         1004.5
io              1001.0
sourcepawn      1007.0
qalb            1006.0
Name: appeared, Length: 500, dtype: float64

Unlike *aggregation* functions, vectorized operations return a Series or DataFrame of the same size. 

In [51]:
pldb["appeared"].max() # This will return one value, it aggregates

2024