---
title: "Indexing, Slicing and Subsetting DataFrames in Python"
teaching: 15   
exercises: 10
questions:
- "How do we access different parts of a DataFrame?"
objectives:
- "Learn about 0-based indexing in Python."
- "Learn about numeric vs. label based indexes."
- "Learn how to select subsets of data from a DataFrame using Slicing and
   Indexing methods."
- "Understand what a boolean object is and how it can be used to 'mask' or
   identify particular sets of values within another object."

keypoints:
- "Indexing & Slicing."
---

## Making Sure Our Data Are Loaded

We will continue to use the articles dataset that we worked with in the last
exercise. Let's reopen it:

In [1]:
# first make sure pandas is loaded
import pandas as pd
# read in the survey csv
articles_df = pd.read_csv("data/articles.csv")

{: .source}

# Indexing & Slicing in Python

We often want to work with subsets of a **DataFrame** object. There are
different ways to accomplish this including: using labels (column headings),
numeric ranges or specific x,y index locations.


## Selecting Data Using Labels (Column Headings)

We use square brackets `[]` to select a subset of an Python object. For example,
we can select all of data from a column named `Authors` from the `articles_df`
DataFrame by name:

In [2]:
articles_df['Authors']

0                          Flavia Pennini|Angelo Plastino
1                              Naveed Aslam|Peter C. Wynn
2       Rafael R. C. Cuadrat|Juliano C. Cury|Alberto M...
3       Fabrizio Ortu|Hao Zhu|Marie-Emmanuelle Boulon|...
4       Magali Troin|Richard Arsenault|François Brissette
5       Xiaoxiao Qin|Yun Feng Xing|Zhiqin Zhou|Yuncong...
6                   Anton Axelsson|Linda Ta|Henrik Sundén
7       Weihong Min|Huiying Li|Hongmei Li|Chunlei Liu|...
8                                  Tosiaki Kori|Yuto Imai
9       Christina Schraml|Sascha Kaufmann|Hansjoerg Re...
10      Mark Billing|Tobias Rudolph|Eric Täuscher|Rain...
11      Simon Karpenko|Ivan Konovalenko|Alexander Mill...
12      Kelly M. Fulton|Elena Mendoza-Barberá|Susan M....
13           Gongde Wu|Xiaoli Wang|Taineng Jiang|Qibo Lin
14      Wei Long|Wenge Qiu|Chongwei Guo|Chuanqiang Li|...
15                                       Andrew J. Larner
16      Komaraiah Palle|Chinnadurai Mani|Kaushlendra T...
17           C

{: .source}

This syntax, calling the column as an attribute, gives you the same output:

In [3]:
articles_df.Authors

0                          Flavia Pennini|Angelo Plastino
1                              Naveed Aslam|Peter C. Wynn
2       Rafael R. C. Cuadrat|Juliano C. Cury|Alberto M...
3       Fabrizio Ortu|Hao Zhu|Marie-Emmanuelle Boulon|...
4       Magali Troin|Richard Arsenault|François Brissette
5       Xiaoxiao Qin|Yun Feng Xing|Zhiqin Zhou|Yuncong...
6                   Anton Axelsson|Linda Ta|Henrik Sundén
7       Weihong Min|Huiying Li|Hongmei Li|Chunlei Liu|...
8                                  Tosiaki Kori|Yuto Imai
9       Christina Schraml|Sascha Kaufmann|Hansjoerg Re...
10      Mark Billing|Tobias Rudolph|Eric Täuscher|Rain...
11      Simon Karpenko|Ivan Konovalenko|Alexander Mill...
12      Kelly M. Fulton|Elena Mendoza-Barberá|Susan M....
13           Gongde Wu|Xiaoli Wang|Taineng Jiang|Qibo Lin
14      Wei Long|Wenge Qiu|Chongwei Guo|Chuanqiang Li|...
15                                       Andrew J. Larner
16      Komaraiah Palle|Chinnadurai Mani|Kaushlendra T...
17           C

{: .source}

We can also create an new object that contains the data within the `Authors`
column as follows:

In [4]:
# create an object named authors that only contains the *Authors* column
authors = articles_df['Authors']

{: .source}

We can pass a list of column names too, as an index to select columns in that
order. This is useful when we need to reorganize our data.

**NOTE:** If a column name is not contained in the DataFrame, an exception
(error) will be raised.

In [5]:
# select the Authors and ISSNs (publishers) columns from the DataFrame
articles_df[['Authors', 'ISSNs']]

Unnamed: 0,Authors,ISSNs
0,Flavia Pennini|Angelo Plastino,1099-4300
1,Naveed Aslam|Peter C. Wynn,2077-0472
2,Rafael R. C. Cuadrat|Juliano C. Cury|Alberto M...,1422-0067
3,Fabrizio Ortu|Hao Zhu|Marie-Emmanuelle Boulon|...,2304-6740
4,Magali Troin|Richard Arsenault|François Brissette,2306-5338
5,Xiaoxiao Qin|Yun Feng Xing|Zhiqin Zhou|Yuncong...,1420-3049
6,Anton Axelsson|Linda Ta|Henrik Sundén,2073-4344
7,Weihong Min|Huiying Li|Hongmei Li|Chunlei Liu|...,1422-0067
8,Tosiaki Kori|Yuto Imai,2073-8994
9,Christina Schraml|Sascha Kaufmann|Hansjoerg Re...,2075-4418


In [6]:
# what happens when you flip the order?
articles_df[['ISSNs', 'Authors']]

Unnamed: 0,ISSNs,Authors
0,1099-4300,Flavia Pennini|Angelo Plastino
1,2077-0472,Naveed Aslam|Peter C. Wynn
2,1422-0067,Rafael R. C. Cuadrat|Juliano C. Cury|Alberto M...
3,2304-6740,Fabrizio Ortu|Hao Zhu|Marie-Emmanuelle Boulon|...
4,2306-5338,Magali Troin|Richard Arsenault|François Brissette
5,1420-3049,Xiaoxiao Qin|Yun Feng Xing|Zhiqin Zhou|Yuncong...
6,2073-4344,Anton Axelsson|Linda Ta|Henrik Sundén
7,1422-0067,Weihong Min|Huiying Li|Hongmei Li|Chunlei Liu|...
8,2073-8994,Tosiaki Kori|Yuto Imai
9,2075-4418,Christina Schraml|Sascha Kaufmann|Hansjoerg Re...


In [7]:
#what happens if you ask for a column that doesn't exist?
articles_df['column_that_does_not_exist']

KeyError: 'column_that_does_not_exist'

{: .source}


## Extracting Range based Subsets: Slicing

**REMINDER**: Python Uses 0-based Indexing

Let's remind ourselves that Python uses 0-based
indexing. This means that the first element in an object is located at
position 0.
This is different from other tools like R and Matlab that index elements
within objects starting at 1.

In [None]:
# Create a list of numbers
grades = [88, 72, 93, 94]

{: .source}

![indexing diagram]({{ page.root }}/fig/slicing-indexing.svg)
![slicing diagram]({{ page.root }}/fig/slicing-slicing.svg)

> ## Challenge
>
> 1. What value does the code below return?
>
> ~~~
> grades[0]
> ~~~
> {: .source}
> 2. How about this:
>
> ~~~
> grades[4]
> ~~~
> {: .source}
> 3. Or this?
>
> ~~~
> grades[len(grades)]
> ~~~
> {: .source}
> 4. In the example above, calling `grades[4]` returns an error. Why is that?
{: .challenge}

## Slicing Subsets of Rows in Python

Slicing using the `[]` operator selects a set of rows and/or columns from a
DataFrame. To slice out a set of rows, you use the following syntax:
`data[start:stop]`. When slicing in pandas the start bound is included in the
output. The stop bound is one step BEYOND the row you want to select. So if you
want to select rows 0, 1 and 2 your code would look like this:

In [8]:
# select rows 0,1,2 (but not 3)
articles_df[0:3]

Unnamed: 0,id,Title,Authors,DOI,URL,Subjects,ISSNs,Citation,LanguageId,LicenceId,Author_Count,First_Author,Citation_Count,Day,Month,Year
0,0,The Fisher Thermodynamics of Quasi-Probabilities,Flavia Pennini|Angelo Plastino,10.3390/e17127853,https://doaj.org/article/b75e8d5cca3f46cbbd63e...,Fisher information|quasi-probabilities|complem...,1099-4300,"Entropy, Vol 17, Iss 12, Pp 7848-7858 (2015)",1,1,2,Flavia Pennini,4,1,11,2015
1,1,Aflatoxin Contamination of the Milk Supply: A ...,Naveed Aslam|Peter C. Wynn,10.3390/agriculture5041172,https://doaj.org/article/0edc5af6672641c0bd456...,aflatoxins|AFM1|AFB1|milk marketing chains|hep...,2077-0472,"Agriculture (Basel), Vol 5, Iss 4, Pp 1172-118...",1,1,2,Naveed Aslam,5,1,11,2015
2,2,Metagenomic Analysis of Upwelling-Affected Bra...,Rafael R. C. Cuadrat|Juliano C. Cury|Alberto M...,10.3390/ijms161226101,https://doaj.org/article/d9fe469f75a0442382b84...,PKS|NRPS|metagenomics|environmental genomics|u...,1422-0067,"International Journal of Molecular Sciences, V...",1,1,3,Rafael R. C. Cuadrat,8,1,11,2015


{: .source}

The stop bound in Python is different from what you might be used to in
languages like Matlab and R.

In [10]:
# select the first, second and third rows from the articles_df
articles_df[0:3]

Unnamed: 0,id,Title,Authors,DOI,URL,Subjects,ISSNs,Citation,LanguageId,LicenceId,Author_Count,First_Author,Citation_Count,Day,Month,Year
0,0,The Fisher Thermodynamics of Quasi-Probabilities,Flavia Pennini|Angelo Plastino,10.3390/e17127853,https://doaj.org/article/b75e8d5cca3f46cbbd63e...,Fisher information|quasi-probabilities|complem...,1099-4300,"Entropy, Vol 17, Iss 12, Pp 7848-7858 (2015)",1,1,2,Flavia Pennini,4,1,11,2015
1,1,Aflatoxin Contamination of the Milk Supply: A ...,Naveed Aslam|Peter C. Wynn,10.3390/agriculture5041172,https://doaj.org/article/0edc5af6672641c0bd456...,aflatoxins|AFM1|AFB1|milk marketing chains|hep...,2077-0472,"Agriculture (Basel), Vol 5, Iss 4, Pp 1172-118...",1,1,2,Naveed Aslam,5,1,11,2015
2,2,Metagenomic Analysis of Upwelling-Affected Bra...,Rafael R. C. Cuadrat|Juliano C. Cury|Alberto M...,10.3390/ijms161226101,https://doaj.org/article/d9fe469f75a0442382b84...,PKS|NRPS|metagenomics|environmental genomics|u...,1422-0067,"International Journal of Molecular Sciences, V...",1,1,3,Rafael R. C. Cuadrat,8,1,11,2015


In [11]:
# select the first 5 rows (rows 0,1,2,3,4)
articles_df[:5]

Unnamed: 0,id,Title,Authors,DOI,URL,Subjects,ISSNs,Citation,LanguageId,LicenceId,Author_Count,First_Author,Citation_Count,Day,Month,Year
0,0,The Fisher Thermodynamics of Quasi-Probabilities,Flavia Pennini|Angelo Plastino,10.3390/e17127853,https://doaj.org/article/b75e8d5cca3f46cbbd63e...,Fisher information|quasi-probabilities|complem...,1099-4300,"Entropy, Vol 17, Iss 12, Pp 7848-7858 (2015)",1,1,2,Flavia Pennini,4,1,11,2015
1,1,Aflatoxin Contamination of the Milk Supply: A ...,Naveed Aslam|Peter C. Wynn,10.3390/agriculture5041172,https://doaj.org/article/0edc5af6672641c0bd456...,aflatoxins|AFM1|AFB1|milk marketing chains|hep...,2077-0472,"Agriculture (Basel), Vol 5, Iss 4, Pp 1172-118...",1,1,2,Naveed Aslam,5,1,11,2015
2,2,Metagenomic Analysis of Upwelling-Affected Bra...,Rafael R. C. Cuadrat|Juliano C. Cury|Alberto M...,10.3390/ijms161226101,https://doaj.org/article/d9fe469f75a0442382b84...,PKS|NRPS|metagenomics|environmental genomics|u...,1422-0067,"International Journal of Molecular Sciences, V...",1,1,3,Rafael R. C. Cuadrat,8,1,11,2015
3,3,Synthesis and Reactivity of a Cerium(III) Scor...,Fabrizio Ortu|Hao Zhu|Marie-Emmanuelle Boulon|...,10.3390/inorganics3040534,https://doaj.org/article/95606ed39deb4f43b96f7...,lanthanide|cerium|scorpionate|tris(pyrazolyl)b...,2304-6740,"Inorganics (Basel), Vol 3, Iss 4, Pp 534-553 (...",1,1,4,Fabrizio Ortu,5,1,11,2015
4,4,Performance and Uncertainty Evaluation of Snow...,Magali Troin|Richard Arsenault|François Brissette,10.3390/hydrology2040289,https://doaj.org/article/18b1d70730d44573ab5c2...,snow models|hydrological models|snowmelt|uncer...,2306-5338,"Hydrology, Vol 2, Iss 4, Pp 289-317 (2015)",1,1,3,Magali Troin,4,1,11,2015


In [13]:
# select the last element in the list
articles_df[-1:]

Unnamed: 0,id,Title,Authors,DOI,URL,Subjects,ISSNs,Citation,LanguageId,LicenceId,Author_Count,First_Author,Citation_Count,Day,Month,Year
1000,1000,Metagenomic Analysis of Upwelling-Affected Bra...,Rafael R. C. Cuadrat|Juliano C. Cury|Alberto M...,10.3390/ijms161226101,https://doaj.org/article/d9fe469f75a0442382b84...,PKS|NRPS|metagenomics|environmental genomics|u...,1422-0067,"International Journal of Molecular Sciences, V...",1,1,3,Rafael R. C. Cuadrat,8,1,11,2015


{: .source}

We can also reassign values within subsets of our DataFrame.
But before we do that, let's make a
copy of our DataFrame so as not to modify our original imported data.

In [16]:
# copy the surveys dataframe so we don't modify the original DataFrame
articles_copy = articles_df

# set the first three rows of data in the DataFrame to 0
articles_copy[0:3] = 0
articles_copy

Unnamed: 0,id,Title,Authors,DOI,URL,Subjects,ISSNs,Citation,LanguageId,LicenceId,Author_Count,First_Author,Citation_Count,Day,Month,Year
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,3,Synthesis and Reactivity of a Cerium(III) Scor...,Fabrizio Ortu|Hao Zhu|Marie-Emmanuelle Boulon|...,10.3390/inorganics3040534,https://doaj.org/article/95606ed39deb4f43b96f7...,lanthanide|cerium|scorpionate|tris(pyrazolyl)b...,2304-6740,"Inorganics (Basel), Vol 3, Iss 4, Pp 534-553 (...",1,1,4,Fabrizio Ortu,5,1,11,2015
4,4,Performance and Uncertainty Evaluation of Snow...,Magali Troin|Richard Arsenault|François Brissette,10.3390/hydrology2040289,https://doaj.org/article/18b1d70730d44573ab5c2...,snow models|hydrological models|snowmelt|uncer...,2306-5338,"Hydrology, Vol 2, Iss 4, Pp 289-317 (2015)",1,1,3,Magali Troin,4,1,11,2015
5,5,Dihydrochalcone Compounds Isolated from Crabap...,Xiaoxiao Qin|Yun Feng Xing|Zhiqin Zhou|Yuncong...,10.3390/molecules201219754,https://doaj.org/article/5765b418183c4b70bb0b7...,Malus crabapples|leaves|dihydrochalcone compou...,1420-3049,"Molecules, Vol 20, Iss 12, Pp 21193-21203 (2015)",1,1,4,Xiaoxiao Qin,4,1,11,2015
6,6,Ionic Liquids as Carbene Catalyst Precursors i...,Anton Axelsson|Linda Ta|Henrik Sundén,10.3390/catal5042052,https://doaj.org/article/d1d39464834447c8bd9c2...,ionic liquid|NHC|OTHO|multicomponent reaction|...,2073-4344,"Catalysts, Vol 5, Iss 4, Pp 2052-2067 (2015)",1,1,3,Anton Axelsson,4,1,11,2015
7,7,Characterization of Aspartate Kinase from Cory...,Weihong Min|Huiying Li|Hongmei Li|Chunlei Liu|...,10.3390/ijms161226098,https://doaj.org/article/253cd7d35aa34a8eaa264...,Corynebacterium pekinense|aspartate kinase|cha...,1422-0067,"International Journal of Molecular Sciences, V...",1,1,5,Weihong Min,8,1,11,2015
8,8,Quaternifications and Extensions of Current Al...,Tosiaki Kori|Yuto Imai,10.3390/sym7042150,https://doaj.org/article/83bb4f8f7d09467da9778...,infinite dimensional lie algebras|current alge...,2073-8994,"Symmetry, Vol 7, Iss 4, Pp 2150-2180 (2015)",1,1,2,Tosiaki Kori,4,1,11,2015
9,9,Imaging of HCC—Current State of the Art,Christina Schraml|Sascha Kaufmann|Hansjoerg Re...,10.3390/diagnostics5040513,https://doaj.org/article/39227747725f45acbe245...,hepatocellular carcinoma|magnetic resonance im...,2075-4418,"Diagnostics, Vol 5, Iss 4, Pp 513-545 (2015)",2,1,7,Christina Schraml,4,1,11,2015


{: .source}

Next, try the following code:

In [17]:
articles_copy.head()

Unnamed: 0,id,Title,Authors,DOI,URL,Subjects,ISSNs,Citation,LanguageId,LicenceId,Author_Count,First_Author,Citation_Count,Day,Month,Year
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,3,Synthesis and Reactivity of a Cerium(III) Scor...,Fabrizio Ortu|Hao Zhu|Marie-Emmanuelle Boulon|...,10.3390/inorganics3040534,https://doaj.org/article/95606ed39deb4f43b96f7...,lanthanide|cerium|scorpionate|tris(pyrazolyl)b...,2304-6740,"Inorganics (Basel), Vol 3, Iss 4, Pp 534-553 (...",1,1,4,Fabrizio Ortu,5,1,11,2015
4,4,Performance and Uncertainty Evaluation of Snow...,Magali Troin|Richard Arsenault|François Brissette,10.3390/hydrology2040289,https://doaj.org/article/18b1d70730d44573ab5c2...,snow models|hydrological models|snowmelt|uncer...,2306-5338,"Hydrology, Vol 2, Iss 4, Pp 289-317 (2015)",1,1,3,Magali Troin,4,1,11,2015


In [18]:
articles_df.head()

Unnamed: 0,id,Title,Authors,DOI,URL,Subjects,ISSNs,Citation,LanguageId,LicenceId,Author_Count,First_Author,Citation_Count,Day,Month,Year
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,3,Synthesis and Reactivity of a Cerium(III) Scor...,Fabrizio Ortu|Hao Zhu|Marie-Emmanuelle Boulon|...,10.3390/inorganics3040534,https://doaj.org/article/95606ed39deb4f43b96f7...,lanthanide|cerium|scorpionate|tris(pyrazolyl)b...,2304-6740,"Inorganics (Basel), Vol 3, Iss 4, Pp 534-553 (...",1,1,4,Fabrizio Ortu,5,1,11,2015
4,4,Performance and Uncertainty Evaluation of Snow...,Magali Troin|Richard Arsenault|François Brissette,10.3390/hydrology2040289,https://doaj.org/article/18b1d70730d44573ab5c2...,snow models|hydrological models|snowmelt|uncer...,2306-5338,"Hydrology, Vol 2, Iss 4, Pp 289-317 (2015)",1,1,3,Magali Troin,4,1,11,2015


{: .source}
What is the difference between the two data frames?

## Referencing Objects vs Copying Objects in Python
We might have thought that we were creating a fresh copy of the `articles_df` objects when we
used the code `articles_copy = articles_df`. However the statement  `y = x` doesn’t create a copy of our DataFrame.
It creates a new variable `y` that refers to the **same** object `x` refers to. This means that there is only one object
(the DataFrame), and both `x` and `y` refer to it. So when we assign the first 3 columns the value of 0 using the
`articles_copy` DataFrame, the `articles_df` DataFrame is modified too. To create a fresh copy of the `articles_df`
DataFrame we use the syntax `y = x.copy()`. But before we have to read the `articles_df` again because the current version contains the unintentional changes made to the first 3 columns.

In [21]:
articles_df = pd.read_csv("data/articles.csv")
articles_copy = articles_df.copy()

{: .source}

## Slicing Subsets of Rows and Columns in Python

We can select specific ranges of our data in both the row and column directions
using either label or integer-based indexing.

- `loc`: indexing via *labels* or *integers*
- `iloc`: indexing via *integers*

To select a subset of rows AND columns from our DataFrame, we can use the `iloc`
method. For example, we can select month, day and year (columns 2, 3 and 4 if we
start counting at 1), like this:

In [22]:
articles_df.iloc[0:3, 1:4]

Unnamed: 0,Title,Authors,DOI
0,The Fisher Thermodynamics of Quasi-Probabilities,Flavia Pennini|Angelo Plastino,10.3390/e17127853
1,Aflatoxin Contamination of the Milk Supply: A ...,Naveed Aslam|Peter C. Wynn,10.3390/agriculture5041172
2,Metagenomic Analysis of Upwelling-Affected Bra...,Rafael R. C. Cuadrat|Juliano C. Cury|Alberto M...,10.3390/ijms161226101


{: .source}

{: .output}

Notice that we asked for a slice from 0:3. This yielded 3 rows of data. When you
ask for 0:3, you are actually telling python to start at index 0 and select rows
0, 1, 2 **up to but not including 3**.

Let's next explore some other ways to index and select subsets of data:

In [26]:
# select all columns for rows of index values 0 and 10
articles_df.loc[[0, 10], :]


Unnamed: 0,id,Title,Authors,DOI,URL,Subjects,ISSNs,Citation,LanguageId,LicenceId,Author_Count,First_Author,Citation_Count,Day,Month,Year
0,0,The Fisher Thermodynamics of Quasi-Probabilities,Flavia Pennini|Angelo Plastino,10.3390/e17127853,https://doaj.org/article/b75e8d5cca3f46cbbd63e...,Fisher information|quasi-probabilities|complem...,1099-4300,"Entropy, Vol 17, Iss 12, Pp 7848-7858 (2015)",1,1,2,Flavia Pennini,4,1,11,2015
10,10,Synthesis and Complexation of Well-Defined Lab...,Mark Billing|Tobias Rudolph|Eric Täuscher|Rain...,10.3390/polym7121526,https://doaj.org/article/515fc66a42e84bdeb8ebd...,atom transfer radical polymerization|intense c...,2073-4360,"Polymers, Vol 7, Iss 12, Pp 2478-2493 (2015)",1,1,5,Mark Billing,4,1,11,2015


In [27]:
# what does this do?
articles_df.loc[0, ['Authors', 'ISSNs', 'Title']]


Authors                      Flavia Pennini|Angelo Plastino
ISSNs                                             1099-4300
Title      The Fisher Thermodynamics of Quasi-Probabilities
Name: 0, dtype: object

In [28]:
# What happens when you type the code below?
articles_df.loc[[0, 10, 35549], :]

Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  return self._getitem_tuple(key)


Unnamed: 0,id,Title,Authors,DOI,URL,Subjects,ISSNs,Citation,LanguageId,LicenceId,Author_Count,First_Author,Citation_Count,Day,Month,Year
0,0.0,The Fisher Thermodynamics of Quasi-Probabilities,Flavia Pennini|Angelo Plastino,10.3390/e17127853,https://doaj.org/article/b75e8d5cca3f46cbbd63e...,Fisher information|quasi-probabilities|complem...,1099-4300,"Entropy, Vol 17, Iss 12, Pp 7848-7858 (2015)",1.0,1.0,2.0,Flavia Pennini,4.0,1.0,11.0,2015.0
10,10.0,Synthesis and Complexation of Well-Defined Lab...,Mark Billing|Tobias Rudolph|Eric Täuscher|Rain...,10.3390/polym7121526,https://doaj.org/article/515fc66a42e84bdeb8ebd...,atom transfer radical polymerization|intense c...,2073-4360,"Polymers, Vol 7, Iss 12, Pp 2478-2493 (2015)",1.0,1.0,5.0,Mark Billing,4.0,1.0,11.0,2015.0
35549,,,,,,,,,,,,,,,,


{: .source}

NOTE: Labels must be found in the DataFrame or you will get a *KeyError*. The
start bound and the stop bound are **included**.  When using `loc`, integers
*can* also be used, but they refer to the index label and not the position. Thus
when you use `loc`, and select 1:4, you will get a different result than using
`iloc` to select rows 1:4.

We can also select a specific data value according to the specific row and
column location within the data frame using the `iloc` function:
`df.iloc[row,column]`.

In [29]:
articles_df.iloc[2,1]

'Metagenomic Analysis of Upwelling-Affected Brazilian Coastal Seawater Reveals Sequence Domains of Type I PKS and Modular NRPS'

{: .source}

{: .output}

Remember that Python indexing begins at 0. So, the index location [2, 0] selects
the element that is 3 rows down and first column in the DataFrame.

> ## Challenge Activities
>
> 1. What happens when you type:
>
> ~~~
> articles_df[0:3]
> articles_df[:5]
> articles_df[-1:]
> ~~~
> {: .source}
>
> 2. What happens when you call:
>     - `articles_df.iloc[0:4, 1:4]`
>     - `articles_df.loc[0:4, 1:4]`
>     - How are the two commands different?
{: .challenge}

## Subsetting Data Using Criteria

We can also select a subset of our data using criteria. For example, we can
select all rows that have a single author.

In [30]:
articles_df[articles_df.Author_Count==1]

Unnamed: 0,id,Title,Authors,DOI,URL,Subjects,ISSNs,Citation,LanguageId,LicenceId,Author_Count,First_Author,Citation_Count,Day,Month,Year
15,15,Performance-Based Cognitive Screening Instrume...,Andrew J. Larner,10.3390/diagnostics5040504,https://doaj.org/article/115729fe9cad481681cd5...,diagnosis|cognitive screening instruments|deme...,2075-4418,"Diagnostics, Vol 5, Iss 4, Pp 504-512 (2015)",2,1,1,Andrew J. Larner,4,1,11,2015
27,27,Comments on Ekino et al. Cloning and Character...,Leopoldo Palma,10.3390/toxins7124865,https://doaj.org/article/45c6d07238a24bd9b8426...,n/a|Biology (General)|QH301-705.5|Science|Q,2072-6651,"Toxins, Vol 7, Iss 12, Pp 5094-5095 (2015)",1,1,1,Leopoldo Palma,4,1,11,2015
64,64,The Ubiquity of Humanity and Textuality in Hum...,Daihyun Chung,10.3390/h4040885,https://doaj.org/article/e4b3bd6870ad4a6fac75f...,humanities|questions|languages|integrationalit...,2076-0787,"Humanities , Vol 4, Iss 4, Pp 885-904 (2015)",2,1,1,Daihyun Chung,4,1,11,2015
70,70,Effect of Water Nutrient Pollution on Long-Ter...,Robert E. Melchers,10.3390/ma8125443,https://doaj.org/article/b7b738cd706146b79c2e0...,long-term corrosion|copper alloys|Cu-Ni|bi-mod...,1996-1944,"Materials, Vol 8, Iss 12, Pp 8047-8058 (2015)",1,1,1,Robert E. Melchers,4,1,11,2015
82,82,Drought Management Strategies in Spain,Pilar Paneque,10.3390/w7126655,https://doaj.org/article/50abcf3617654523a5646...,Water Framework Directive|water policies|risk|...,2073-4441,"Water, Vol 7, Iss 12, Pp 6689-6701 (2015)",1,1,1,Pilar Paneque,4,1,11,2015
88,88,Establishment and Applied Research on aWetland...,Han-Shen Chen,10.3390/su71215785,https://doaj.org/article/deb39beb36c2476ca1ac2...,wetlands|ecosystem|energy ecological footprint...,2071-1050,"Sustainability, Vol 7, Iss 12, Pp 15785-15793 ...",2,1,1,Han-Shen Chen,5,1,11,2015
95,95,Primality Testing and Factorization by using F...,Musha Takaaki,,https://doaj.org/article/5965d9a2ceb643c5b44a3...,Primality testing|prime factorization|Fourier ...,2067-2764|2247-6202,Theory and Applications of Mathematics & Compu...,1,2,1,Musha Takaaki,9,1,11,2015
101,101,Some Perturbed Ostrowski Type Inequalities for...,S. S. Dragomir,,https://doaj.org/article/3e19700969ad43d78130f...,Ostrowski’s inequality|Integral inequalities|P...,2067-2764|2247-6202,Theory and Applications of Mathematics & Compu...,1,2,1,S. S. Dragomir,9,1,11,2015
104,104,A Mixed Integer Linear Programming Formulation...,Marija Ivanović,,https://doaj.org/article/1dea7b63a59a487c8305f...,Restrained Roman domination in graphs|combinat...,2067-2764|2247-6202,Theory and Applications of Mathematics & Compu...,1,2,1,Marija Ivanović,9,1,11,2015
135,135,Properties of Stabilizing Computations,Mark Burgin,,https://doaj.org/article/18e5b1d5f58842f7820d5...,computation|stability|Turing machine|inductive...,2067-2764|2247-6202,Theory and Applications of Mathematics & Compu...,1,2,1,Mark Burgin,8,1,4,2015


{: .source}

{: .output}

Or we can select all rows that have more than one author.

In [31]:
articles_df[articles_df.Author_Count != 1]

Unnamed: 0,id,Title,Authors,DOI,URL,Subjects,ISSNs,Citation,LanguageId,LicenceId,Author_Count,First_Author,Citation_Count,Day,Month,Year
0,0,The Fisher Thermodynamics of Quasi-Probabilities,Flavia Pennini|Angelo Plastino,10.3390/e17127853,https://doaj.org/article/b75e8d5cca3f46cbbd63e...,Fisher information|quasi-probabilities|complem...,1099-4300,"Entropy, Vol 17, Iss 12, Pp 7848-7858 (2015)",1,1,2,Flavia Pennini,4,1,11,2015
1,1,Aflatoxin Contamination of the Milk Supply: A ...,Naveed Aslam|Peter C. Wynn,10.3390/agriculture5041172,https://doaj.org/article/0edc5af6672641c0bd456...,aflatoxins|AFM1|AFB1|milk marketing chains|hep...,2077-0472,"Agriculture (Basel), Vol 5, Iss 4, Pp 1172-118...",1,1,2,Naveed Aslam,5,1,11,2015
2,2,Metagenomic Analysis of Upwelling-Affected Bra...,Rafael R. C. Cuadrat|Juliano C. Cury|Alberto M...,10.3390/ijms161226101,https://doaj.org/article/d9fe469f75a0442382b84...,PKS|NRPS|metagenomics|environmental genomics|u...,1422-0067,"International Journal of Molecular Sciences, V...",1,1,3,Rafael R. C. Cuadrat,8,1,11,2015
3,3,Synthesis and Reactivity of a Cerium(III) Scor...,Fabrizio Ortu|Hao Zhu|Marie-Emmanuelle Boulon|...,10.3390/inorganics3040534,https://doaj.org/article/95606ed39deb4f43b96f7...,lanthanide|cerium|scorpionate|tris(pyrazolyl)b...,2304-6740,"Inorganics (Basel), Vol 3, Iss 4, Pp 534-553 (...",1,1,4,Fabrizio Ortu,5,1,11,2015
4,4,Performance and Uncertainty Evaluation of Snow...,Magali Troin|Richard Arsenault|François Brissette,10.3390/hydrology2040289,https://doaj.org/article/18b1d70730d44573ab5c2...,snow models|hydrological models|snowmelt|uncer...,2306-5338,"Hydrology, Vol 2, Iss 4, Pp 289-317 (2015)",1,1,3,Magali Troin,4,1,11,2015
5,5,Dihydrochalcone Compounds Isolated from Crabap...,Xiaoxiao Qin|Yun Feng Xing|Zhiqin Zhou|Yuncong...,10.3390/molecules201219754,https://doaj.org/article/5765b418183c4b70bb0b7...,Malus crabapples|leaves|dihydrochalcone compou...,1420-3049,"Molecules, Vol 20, Iss 12, Pp 21193-21203 (2015)",1,1,4,Xiaoxiao Qin,4,1,11,2015
6,6,Ionic Liquids as Carbene Catalyst Precursors i...,Anton Axelsson|Linda Ta|Henrik Sundén,10.3390/catal5042052,https://doaj.org/article/d1d39464834447c8bd9c2...,ionic liquid|NHC|OTHO|multicomponent reaction|...,2073-4344,"Catalysts, Vol 5, Iss 4, Pp 2052-2067 (2015)",1,1,3,Anton Axelsson,4,1,11,2015
7,7,Characterization of Aspartate Kinase from Cory...,Weihong Min|Huiying Li|Hongmei Li|Chunlei Liu|...,10.3390/ijms161226098,https://doaj.org/article/253cd7d35aa34a8eaa264...,Corynebacterium pekinense|aspartate kinase|cha...,1422-0067,"International Journal of Molecular Sciences, V...",1,1,5,Weihong Min,8,1,11,2015
8,8,Quaternifications and Extensions of Current Al...,Tosiaki Kori|Yuto Imai,10.3390/sym7042150,https://doaj.org/article/83bb4f8f7d09467da9778...,infinite dimensional lie algebras|current alge...,2073-8994,"Symmetry, Vol 7, Iss 4, Pp 2150-2180 (2015)",1,1,2,Tosiaki Kori,4,1,11,2015
9,9,Imaging of HCC—Current State of the Art,Christina Schraml|Sascha Kaufmann|Hansjoerg Re...,10.3390/diagnostics5040513,https://doaj.org/article/39227747725f45acbe245...,hepatocellular carcinoma|magnetic resonance im...,2075-4418,"Diagnostics, Vol 5, Iss 4, Pp 513-545 (2015)",2,1,7,Christina Schraml,4,1,11,2015


{: .source}

We can define sets of criteria too:

In [40]:
articles_df[(articles_df.Month >= 7) & (articles_df.Year <= 2015)]

Unnamed: 0,id,Title,Authors,DOI,URL,Subjects,ISSNs,Citation,LanguageId,LicenceId,Author_Count,First_Author,Citation_Count,Day,Month,Year
0,0,The Fisher Thermodynamics of Quasi-Probabilities,Flavia Pennini|Angelo Plastino,10.3390/e17127853,https://doaj.org/article/b75e8d5cca3f46cbbd63e...,Fisher information|quasi-probabilities|complem...,1099-4300,"Entropy, Vol 17, Iss 12, Pp 7848-7858 (2015)",1,1,2,Flavia Pennini,4,1,11,2015
1,1,Aflatoxin Contamination of the Milk Supply: A ...,Naveed Aslam|Peter C. Wynn,10.3390/agriculture5041172,https://doaj.org/article/0edc5af6672641c0bd456...,aflatoxins|AFM1|AFB1|milk marketing chains|hep...,2077-0472,"Agriculture (Basel), Vol 5, Iss 4, Pp 1172-118...",1,1,2,Naveed Aslam,5,1,11,2015
2,2,Metagenomic Analysis of Upwelling-Affected Bra...,Rafael R. C. Cuadrat|Juliano C. Cury|Alberto M...,10.3390/ijms161226101,https://doaj.org/article/d9fe469f75a0442382b84...,PKS|NRPS|metagenomics|environmental genomics|u...,1422-0067,"International Journal of Molecular Sciences, V...",1,1,3,Rafael R. C. Cuadrat,8,1,11,2015
3,3,Synthesis and Reactivity of a Cerium(III) Scor...,Fabrizio Ortu|Hao Zhu|Marie-Emmanuelle Boulon|...,10.3390/inorganics3040534,https://doaj.org/article/95606ed39deb4f43b96f7...,lanthanide|cerium|scorpionate|tris(pyrazolyl)b...,2304-6740,"Inorganics (Basel), Vol 3, Iss 4, Pp 534-553 (...",1,1,4,Fabrizio Ortu,5,1,11,2015
4,4,Performance and Uncertainty Evaluation of Snow...,Magali Troin|Richard Arsenault|François Brissette,10.3390/hydrology2040289,https://doaj.org/article/18b1d70730d44573ab5c2...,snow models|hydrological models|snowmelt|uncer...,2306-5338,"Hydrology, Vol 2, Iss 4, Pp 289-317 (2015)",1,1,3,Magali Troin,4,1,11,2015
5,5,Dihydrochalcone Compounds Isolated from Crabap...,Xiaoxiao Qin|Yun Feng Xing|Zhiqin Zhou|Yuncong...,10.3390/molecules201219754,https://doaj.org/article/5765b418183c4b70bb0b7...,Malus crabapples|leaves|dihydrochalcone compou...,1420-3049,"Molecules, Vol 20, Iss 12, Pp 21193-21203 (2015)",1,1,4,Xiaoxiao Qin,4,1,11,2015
6,6,Ionic Liquids as Carbene Catalyst Precursors i...,Anton Axelsson|Linda Ta|Henrik Sundén,10.3390/catal5042052,https://doaj.org/article/d1d39464834447c8bd9c2...,ionic liquid|NHC|OTHO|multicomponent reaction|...,2073-4344,"Catalysts, Vol 5, Iss 4, Pp 2052-2067 (2015)",1,1,3,Anton Axelsson,4,1,11,2015
7,7,Characterization of Aspartate Kinase from Cory...,Weihong Min|Huiying Li|Hongmei Li|Chunlei Liu|...,10.3390/ijms161226098,https://doaj.org/article/253cd7d35aa34a8eaa264...,Corynebacterium pekinense|aspartate kinase|cha...,1422-0067,"International Journal of Molecular Sciences, V...",1,1,5,Weihong Min,8,1,11,2015
8,8,Quaternifications and Extensions of Current Al...,Tosiaki Kori|Yuto Imai,10.3390/sym7042150,https://doaj.org/article/83bb4f8f7d09467da9778...,infinite dimensional lie algebras|current alge...,2073-8994,"Symmetry, Vol 7, Iss 4, Pp 2150-2180 (2015)",1,1,2,Tosiaki Kori,4,1,11,2015
9,9,Imaging of HCC—Current State of the Art,Christina Schraml|Sascha Kaufmann|Hansjoerg Re...,10.3390/diagnostics5040513,https://doaj.org/article/39227747725f45acbe245...,hepatocellular carcinoma|magnetic resonance im...,2075-4418,"Diagnostics, Vol 5, Iss 4, Pp 513-545 (2015)",2,1,7,Christina Schraml,4,1,11,2015


In [3]:
articles_df[~((articles_df.Author_Count == 2) & (articles_df.LanguageId==3))]

Unnamed: 0,id,Title,Authors,DOI,URL,Subjects,ISSNs,Citation,LanguageId,LicenceId,Author_Count,First_Author,Citation_Count,Day,Month,Year
0,0,The Fisher Thermodynamics of Quasi-Probabilities,Flavia Pennini|Angelo Plastino,10.3390/e17127853,https://doaj.org/article/b75e8d5cca3f46cbbd63e...,Fisher information|quasi-probabilities|complem...,1099-4300,"Entropy, Vol 17, Iss 12, Pp 7848-7858 (2015)",1,1,2,Flavia Pennini,4,1,11,2015
1,1,Aflatoxin Contamination of the Milk Supply: A ...,Naveed Aslam|Peter C. Wynn,10.3390/agriculture5041172,https://doaj.org/article/0edc5af6672641c0bd456...,aflatoxins|AFM1|AFB1|milk marketing chains|hep...,2077-0472,"Agriculture (Basel), Vol 5, Iss 4, Pp 1172-118...",1,1,2,Naveed Aslam,5,1,11,2015
2,2,Metagenomic Analysis of Upwelling-Affected Bra...,Rafael R. C. Cuadrat|Juliano C. Cury|Alberto M...,10.3390/ijms161226101,https://doaj.org/article/d9fe469f75a0442382b84...,PKS|NRPS|metagenomics|environmental genomics|u...,1422-0067,"International Journal of Molecular Sciences, V...",1,1,3,Rafael R. C. Cuadrat,8,1,11,2015
3,3,Synthesis and Reactivity of a Cerium(III) Scor...,Fabrizio Ortu|Hao Zhu|Marie-Emmanuelle Boulon|...,10.3390/inorganics3040534,https://doaj.org/article/95606ed39deb4f43b96f7...,lanthanide|cerium|scorpionate|tris(pyrazolyl)b...,2304-6740,"Inorganics (Basel), Vol 3, Iss 4, Pp 534-553 (...",1,1,4,Fabrizio Ortu,5,1,11,2015
4,4,Performance and Uncertainty Evaluation of Snow...,Magali Troin|Richard Arsenault|François Brissette,10.3390/hydrology2040289,https://doaj.org/article/18b1d70730d44573ab5c2...,snow models|hydrological models|snowmelt|uncer...,2306-5338,"Hydrology, Vol 2, Iss 4, Pp 289-317 (2015)",1,1,3,Magali Troin,4,1,11,2015
5,5,Dihydrochalcone Compounds Isolated from Crabap...,Xiaoxiao Qin|Yun Feng Xing|Zhiqin Zhou|Yuncong...,10.3390/molecules201219754,https://doaj.org/article/5765b418183c4b70bb0b7...,Malus crabapples|leaves|dihydrochalcone compou...,1420-3049,"Molecules, Vol 20, Iss 12, Pp 21193-21203 (2015)",1,1,4,Xiaoxiao Qin,4,1,11,2015
6,6,Ionic Liquids as Carbene Catalyst Precursors i...,Anton Axelsson|Linda Ta|Henrik Sundén,10.3390/catal5042052,https://doaj.org/article/d1d39464834447c8bd9c2...,ionic liquid|NHC|OTHO|multicomponent reaction|...,2073-4344,"Catalysts, Vol 5, Iss 4, Pp 2052-2067 (2015)",1,1,3,Anton Axelsson,4,1,11,2015
7,7,Characterization of Aspartate Kinase from Cory...,Weihong Min|Huiying Li|Hongmei Li|Chunlei Liu|...,10.3390/ijms161226098,https://doaj.org/article/253cd7d35aa34a8eaa264...,Corynebacterium pekinense|aspartate kinase|cha...,1422-0067,"International Journal of Molecular Sciences, V...",1,1,5,Weihong Min,8,1,11,2015
8,8,Quaternifications and Extensions of Current Al...,Tosiaki Kori|Yuto Imai,10.3390/sym7042150,https://doaj.org/article/83bb4f8f7d09467da9778...,infinite dimensional lie algebras|current alge...,2073-8994,"Symmetry, Vol 7, Iss 4, Pp 2150-2180 (2015)",1,1,2,Tosiaki Kori,4,1,11,2015
9,9,Imaging of HCC—Current State of the Art,Christina Schraml|Sascha Kaufmann|Hansjoerg Re...,10.3390/diagnostics5040513,https://doaj.org/article/39227747725f45acbe245...,hepatocellular carcinoma|magnetic resonance im...,2075-4418,"Diagnostics, Vol 5, Iss 4, Pp 513-545 (2015)",2,1,7,Christina Schraml,4,1,11,2015


{: .source}

# Python Syntax Cheat Sheet

Use can use the syntax below when querying data from a DataFrame. Experiment
with selecting various subsets of the "surveys" data.

* Equals: `==`
* Not equals: `!=`
* Greater than, less than: `>` or `<`
* Greater than or equal to `>=`
* Less than or equal to `<=`

> ## Challenge Activities
>
> 1. Select a subset of rows in the `articles_df` DataFrame that contain articles
>    from at least 2 authors in Spanish (`LanguageId=3`). How many rows did you
>    end up with? What did your neighbor get?
> 2. You can use the `isin` command in python to query a DataFrame based upon a
>    list of values as follows:
>    `articles_df[articles_df['ISSNs'].isin([listGoesHere])]`. Use the `isin` function
>    to find all articles from particular ISSNs. How many records did you get?
> 3. Experiment with other queries. Create a query that finds all rows with
>    an `Author_Count` of 0 or less.
> 4. The `~` symbol in Python can be used to return the OPPOSITE of the
>    selection that you specify in python. It is equivalent to **is not in**.
>    Write a query that selects all rows that are NOT in English (`LanguageId=1`).
{: .challenge}

# Using Masks

A mask can be useful to locate where a particular subset of values exist or
don't exist - for example,  NaN, or "Not a Number" values. To understand masks,
we also need to understand *BOOLEAN* objects in python.

Boolean values include `True` or `False`. So for example

In [44]:
# set x to 5
x = 5
print(x)
# what does the code below return?
print(x > 5)
# how about this?
print(x == 5)

5
False
True


{: .source}

When we ask python what the value of `x > 5` is, we get `False`. This is because x
is not greater than 5 it is equal to 5. To create a boolean mask, you first create the
True / False criteria (e.g. values > 5 = True). Python will then assess each
value in the object to determine whether the value meets the criteria (True) or
not (False). Python creates an output object that is the same shape as
the original object, but with a True or False value for each index location.

Let's try this out. Let's identify all locations in the survey data that have
null (missing or NaN) data values. We can use the `isnull` method to do this.
Each cell with a null value will be assigned a value of  *True* in the new
boolean object.

In [45]:
pd.isnull(articles_df)

Unnamed: 0,id,Title,Authors,DOI,URL,Subjects,ISSNs,Citation,LanguageId,LicenceId,Author_Count,First_Author,Citation_Count,Day,Month,Year
0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
5,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
6,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
7,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
8,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
9,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False


{: .source}

To select the rows where there are null values,  we can use
the mask as an index to subset our data as follows:

In [46]:
#To select just the rows with NaN values, we can use the .any method
articles_df[pd.isnull(articles_df).any(axis=1)]

Unnamed: 0,id,Title,Authors,DOI,URL,Subjects,ISSNs,Citation,LanguageId,LicenceId,Author_Count,First_Author,Citation_Count,Day,Month,Year
95,95,Primality Testing and Factorization by using F...,Musha Takaaki,,https://doaj.org/article/5965d9a2ceb643c5b44a3...,Primality testing|prime factorization|Fourier ...,2067-2764|2247-6202,Theory and Applications of Mathematics & Compu...,1,2,1,Musha Takaaki,9,1,11,2015
96,96,Some Families of q-Series Identities and Assoc...,H. M. Srivastava|S. N. Singh|S. P. Singh,,https://doaj.org/article/8ef413065e064a5b9fa37...,q-Series|q-Identities|q-Series identities|Roge...,2067-2764|2247-6202,Theory and Applications of Mathematics & Compu...,1,2,3,H. M. Srivastava,9,1,11,2015
97,97,Zweier I-Convergent Double Sequence Spaces Def...,A. Khan Vakeel| Khan Nazneen|Yasmeen,,https://doaj.org/article/51e4d8baa99f45be91f75...,Ideal|filter|double sequence|sequence of modul...,2067-2764|2247-6202,Theory and Applications of Mathematics & Compu...,1,2,3,A. Khan Vakeel,9,1,11,2015
98,98,Initial Maclaurin Coecients Bounds for New Sub...,Basem Aref Frasin|Tariq Al-Hawary,,https://doaj.org/article/7d449001593d41349d931...,Analytic and univalent functions|Bi-univalent ...,2067-2764|2247-6202,Theory and Applications of Mathematics & Compu...,1,2,2,Basem Aref Frasin,9,1,11,2015
99,99,Measure of Tessellation Quality of Voronoï Meshes,E. A-iyeh|J.F. Peters,,https://doaj.org/article/ac14c9c370014bd69ece8...,Sites|Mesh Generation|Quality|Tessellations|Vo...,2067-2764|2247-6202,Theory and Applications of Mathematics & Compu...,1,2,2,E. A-iyeh,9,1,11,2015
100,100,Katsaras’s Type Fuzzy Norm under Triangular Norms,Sorin Nădăban|Tudor Bînzar|Flavius Pater|Carme...,,https://doaj.org/article/6f48cbb09c49415b92340...,Fuzzy norm|fuzzy norm linear spaces|fuzzy subs...,2067-2764|2247-6202,Theory and Applications of Mathematics & Compu...,1,2,5,Sorin Nădăban,9,1,11,2015
101,101,Some Perturbed Ostrowski Type Inequalities for...,S. S. Dragomir,,https://doaj.org/article/3e19700969ad43d78130f...,Ostrowski’s inequality|Integral inequalities|P...,2067-2764|2247-6202,Theory and Applications of Mathematics & Compu...,1,2,1,S. S. Dragomir,9,1,11,2015
102,102,Hadamard Product of Certain Harmonic Univalent...,R. M. El-Ashwah|B. A. Frasin,,https://doaj.org/article/a68b345808244e8f9d3ef...,Harmonic functions|meromorphic functions|univa...,2067-2764|2247-6202,Theory and Applications of Mathematics & Compu...,1,2,2,R. M. El-Ashwah,9,1,11,2015
103,103,New Čebyšev Type Inequalities for Functions wh...,B. Meftah|K. Boukerrioua,,https://doaj.org/article/16923e727f35469d816aa...,Čebyšev type inequalities|co-ordinates (s1|m1)...,2067-2764|2247-6202,Theory and Applications of Mathematics & Compu...,1,2,2,B. Meftah,9,1,11,2015
104,104,A Mixed Integer Linear Programming Formulation...,Marija Ivanović,,https://doaj.org/article/1dea7b63a59a487c8305f...,Restrained Roman domination in graphs|combinat...,2067-2764|2247-6202,Theory and Applications of Mathematics & Compu...,1,2,1,Marija Ivanović,9,1,11,2015


{: .source}

We can run `isnull` on a particular column too. What does the code below do?

In [47]:
# what does this do?
no_doi = articles_df[pd.isnull(articles_df['DOI'])]

In [48]:
no_doi

Unnamed: 0,id,Title,Authors,DOI,URL,Subjects,ISSNs,Citation,LanguageId,LicenceId,Author_Count,First_Author,Citation_Count,Day,Month,Year
95,95,Primality Testing and Factorization by using F...,Musha Takaaki,,https://doaj.org/article/5965d9a2ceb643c5b44a3...,Primality testing|prime factorization|Fourier ...,2067-2764|2247-6202,Theory and Applications of Mathematics & Compu...,1,2,1,Musha Takaaki,9,1,11,2015
96,96,Some Families of q-Series Identities and Assoc...,H. M. Srivastava|S. N. Singh|S. P. Singh,,https://doaj.org/article/8ef413065e064a5b9fa37...,q-Series|q-Identities|q-Series identities|Roge...,2067-2764|2247-6202,Theory and Applications of Mathematics & Compu...,1,2,3,H. M. Srivastava,9,1,11,2015
97,97,Zweier I-Convergent Double Sequence Spaces Def...,A. Khan Vakeel| Khan Nazneen|Yasmeen,,https://doaj.org/article/51e4d8baa99f45be91f75...,Ideal|filter|double sequence|sequence of modul...,2067-2764|2247-6202,Theory and Applications of Mathematics & Compu...,1,2,3,A. Khan Vakeel,9,1,11,2015
98,98,Initial Maclaurin Coecients Bounds for New Sub...,Basem Aref Frasin|Tariq Al-Hawary,,https://doaj.org/article/7d449001593d41349d931...,Analytic and univalent functions|Bi-univalent ...,2067-2764|2247-6202,Theory and Applications of Mathematics & Compu...,1,2,2,Basem Aref Frasin,9,1,11,2015
99,99,Measure of Tessellation Quality of Voronoï Meshes,E. A-iyeh|J.F. Peters,,https://doaj.org/article/ac14c9c370014bd69ece8...,Sites|Mesh Generation|Quality|Tessellations|Vo...,2067-2764|2247-6202,Theory and Applications of Mathematics & Compu...,1,2,2,E. A-iyeh,9,1,11,2015
100,100,Katsaras’s Type Fuzzy Norm under Triangular Norms,Sorin Nădăban|Tudor Bînzar|Flavius Pater|Carme...,,https://doaj.org/article/6f48cbb09c49415b92340...,Fuzzy norm|fuzzy norm linear spaces|fuzzy subs...,2067-2764|2247-6202,Theory and Applications of Mathematics & Compu...,1,2,5,Sorin Nădăban,9,1,11,2015
101,101,Some Perturbed Ostrowski Type Inequalities for...,S. S. Dragomir,,https://doaj.org/article/3e19700969ad43d78130f...,Ostrowski’s inequality|Integral inequalities|P...,2067-2764|2247-6202,Theory and Applications of Mathematics & Compu...,1,2,1,S. S. Dragomir,9,1,11,2015
102,102,Hadamard Product of Certain Harmonic Univalent...,R. M. El-Ashwah|B. A. Frasin,,https://doaj.org/article/a68b345808244e8f9d3ef...,Harmonic functions|meromorphic functions|univa...,2067-2764|2247-6202,Theory and Applications of Mathematics & Compu...,1,2,2,R. M. El-Ashwah,9,1,11,2015
103,103,New Čebyšev Type Inequalities for Functions wh...,B. Meftah|K. Boukerrioua,,https://doaj.org/article/16923e727f35469d816aa...,Čebyšev type inequalities|co-ordinates (s1|m1)...,2067-2764|2247-6202,Theory and Applications of Mathematics & Compu...,1,2,2,B. Meftah,9,1,11,2015
104,104,A Mixed Integer Linear Programming Formulation...,Marija Ivanović,,https://doaj.org/article/1dea7b63a59a487c8305f...,Restrained Roman domination in graphs|combinat...,2067-2764|2247-6202,Theory and Applications of Mathematics & Compu...,1,2,1,Marija Ivanović,9,1,11,2015


{: .source}

Let's take a minute to look at the statement above. We are using the Boolean
object as an index. We are asking python to select rows that have a `NaN` value
for DOI (Digital Object Identifier).


> ## Challenges
>
> 1. Create a new DataFrame that only contains observations with Languages
>    that are *NOT* English.
> 2. Create a new DataFrame that contains only observations where the author
>    count is greater than 2. Create a stacked bar plot of average number of
>    authors by language with values stacked for each publisher.
{: .challenge}