---   

<h1 align="center">Introduction to Data Analyst and Data Science for beginners</h1>
<h1 align="center">Lecture no 2.14(Pandas-05)</h1>

---
<h3><div align="right">Ehtisham Sadiq</div></h3>    

## _Indexing, Subsetting and Slicing Dataframes.ipynb_

## Motivation:
- The ability to select specific rows and columns to access and filter data based on specific conditions are two of the key features of Pandas.
    - **Selection** allows you to access specific rows or columns (a subset) of the data by their index and/or location in the DataFrame
        - In large datasets, you may be required to select the first/last N records
        - In large datasets, you may be required to select a range (n to m) of records
        - In large datasets, you may be required to select specific columns of your interest
        - In large datasets, you may be required to select specific range and specific columns of your interest
    - **Filtering** allows you to access specific rows or columns (a subset) of the data based on one or more conditions
        - In a medical dataset, you may be required to filter record of all those patients who suffer with a specific disease, or who have a specific blood group
        - In a medical dataset, you may be required to filter pregnant women who have anemia, and compare this subset to women who don’t have anemia.
        - In a travel dataset, you may be required to filter hotels inside Lahore city, sorted by their minimum per day cost
        - In a client dataset, you may be required filter the clients who use a Gmail account(may require a string filter)
        - In a client dataset, you may be required to filter the clients who belong to a specific countries (may require use of .isin() function)

## Learning agenda of this notebook
1. Understanding Indices of a Dataframe
    - Understand the Dataset
    - Changing the Column Indices of a Dataframe
    - Changing the Row Indices of a Dataframe
2. Selecting Row(s) and Column(s) of a Dataframe using `df[]` 
3. Selecting Rows and Columns using `iloc` Method
4. Selecting Rows and Columns using `loc` Method
5. Conditional Selection   
6. Selecting columns of a specific data type


## 1. Understanding Indices of a Dataframe

<img align="right" width="300" height="300"  src="images/series-anatomy.png"  >
<img align="left" width="500" height="500"  src="images/dataframe.png"  >

Series:
- We have seen a series is a an object like a 1d array capable of holding a sequence of values of any data type.
- Every data value of series has an explicit row index associated with it ,which can be a numeric value as well as a string. 
- We have also seen that we can use these indices for three purposes for accessing the elements ,for subsetting or slicing purposes.

DataFrame:
- Dataframe is a 2d labeled data structure so it has two indices. A row index that moves from top to bottom is associated with rows of a dataframe and a column index that moves from left to right is associated with the columns of a dataframe. By default the row indices starts from 0 and ends at n-1 where n is the total number of rows in a dataframe and this is actually an integer identifier for the rows similarly the column indices by default also starts from 0 and ends at n-1 where n is the total number of columns in a data frame and this is actually an integer identifier for the columns.

###  a. Understand the Dataset
- Let us first understand the dataframe on which we are going to work in today's notebook

In [None]:
import numpy as np
import pandas as pd
df = pd.read_csv('datasets/groupdata.csv')
df

In [None]:
# `shape` attribute of a dataframe object return a two value tuple containing rows and columns
# Note the rows count does not include the column labels and column count does not include the row index
df.shape

In [None]:
# `index` attribute of a dataframe object return the list of row indices and its datatype
df.index

In [None]:
# `columns` attribute of a dataframe object return the list of column labels and its datatype
df.columns

In [None]:
# `dtypes` attribute of a dataframe object return the data type of each column in the dataframe
df.dtypes

In [None]:
#This method prints information about a DataFrame including the row indices, column labels, 
# non-null values count in each column, datatype and memory usage
df.info()

In [None]:
# If no argument is passed, this method displays descriptive statistics about the numeric columns of the dataframe
df.describe(include='all')

### b. Changing the Column Indices/Labels of a Datarame
- Every dataframe has column labels associated with its columns
- These by default are integer values from 0,1,2,3...
- However, while creating a dataframe from scratch, or while reading them from a file you can set them to more meaningful string values.
- While reading from csv file the first row in the file is taken as the column labels
- We can change the column labels, if we want
- Let us practically see this for better understanding

In [None]:
import pandas as pd
df = pd.read_csv('datasets/groupdata.csv')
df.head()

In [None]:
df = pd.read_csv('datasets/groupdatawithoutcollables.csv')
df.head()


In [None]:
# To read such files, you have to pass the parameter `header=None` to the `read_csv()` method
df = pd.read_csv('datasets/groupdatawithoutcollables.csv', header=None)
df.head()


>Let us suppose we have above dataframe, in which the column indices are just integer values associated with the position of every column. We want to assign some meaningful names to the columns for better understanding. There are many options or ways to do that.

**Changing Column IndicesLabels:** Assign a list of column labels to the `columns` attribute of dataframe

In [None]:
col_names = ['roll no', 'name', 'age', 'address', 'session', 'group', 'gender', 'subj1', 'subj2', 'scholarship']
df.columns = col_names
df.head()

>Note that in the above dataframe, first column name has a space, which is a bit difficult to use sometimes, so if you want to change value of a specific column label, you can use the `df.rename()` method

In [None]:
# You pass a dictionary object to the columns argument to rename() method
# The key is the old column name, while the value is the new column name
df1 = df.rename(columns={'roll no': 'rollno'}, inplace=False)

In [None]:
df1.head(3)

>Last but not the least, another way is to assign appropriate column labels to your dataframes by passing a list of column names to the `names` argument of the `df.read_csv()` method. Do it at your own :)

### c. Changing the Row Indices/Labels of a  Dataframe
- Every dataframe has row index associated with its rows
- These by default are integer values from 0,1,2,3...
- However, while creating a dataframe from scratch you may set them to some meaningful string values (seldom required).
- We have already seen this in our previous session
- Today, we will see two methods that work on row indices of a Pandas Dataframe named `df.set_index()` and `df.reset_index()`

In [None]:
# Let us load the dataset again into dataframe
import pandas as pd
df = pd.read_csv('datasets/groupdata.csv')
df.head(3)

In [None]:
df.index

>Let us suppose we have above dataframe, in which the row indices are just integer values associated with the position of every row. We want to assign some meaningful indices to the rows for better understanding. Suppose, we want to set values of the column rollno as index of this dataframe. So we donot want positional indices rather want some meaningful string indices, which are roll numbers of students in this case. 

**Changing Row Indices:** The `df.set_index()` method can be used to change row index of a dataframe using an existing column(s)
`df.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)`

- Where
    - `keys` is the column label 
    - `drop=True`, to drop the column from data part of dataframe that is being used as column index now
    - `append=False`, Set it to True if you want to append columns to existing index
    - `inplace=False`, Set it to True to make changes in the original dataframe, i.e., do not create a new object
    - `verify_integrity=False`, Set it to True to check the new index for duplicates. Default value of False will improve the performance of this method.

Returns Dataframe if `inplace=False` or None if `inplace=True`

In [None]:
df1 = df.set_index(keys='roll no', drop=False)
df1.head(3)

Note: The rollno column still exist as part of the dataframe. To drop it set `drop=True`

In [None]:
df1 = df.set_index(keys='roll no', drop=True)
df1.head(3)

Another point to note is that no change has been made to the original dataframe since `inplace` arguement is by default False. Let us verify this

In [None]:
df.head()

Instead of returning a dataframe, the `df.set_index()` method can change the column index inplace. Let us do that now

In [None]:
df.set_index(keys='roll no', drop=True, inplace=True)
df.head()

>Note that the roll no column has become the index now, it is no more data of the dataframe

In [None]:
df.index

**Changing Row Indices Back to Positional:** The `df.reset_index()` method is used to reset the row index of the dataframe back to positional integer indices

`df.reset_index(drop=False, inplace=False)`
Where
- `drop=False`, Do not try to insert index into dataframe columns. This resets the index to the default integer index.
- `inplace=False`,  Modify the DataFrame in place (do not create a new object).

Returns Dataframe if `inplace=False` or None if `inplace=True`

In [None]:
df.head()

In [None]:
df.shape

In [None]:
# reset the index
df2 = df.reset_index()
df2.index

In [None]:
df2.head(3)

In [None]:
df3 = df2.reset_index()

In [None]:
df3

## 2. Selecting Column(s) and Row(s) of a Dataframe using `df[]` 

<img align="right" width="400" height="400"  src="images/groupdata.png"  >

- Consider this dataframe which is sorted by ages. Note the row indices has been randomized and they do not correspond to positional indices (which we normally visualize as 0,1,2,3...)
- To access column(s) of a dataframe:
    - To access single column, mention the column index/label inside `[]`, which in this case are strings, however, can be integer values as well. It will return a new Series object.
    - To access multiple columns, pass column indices/labels as a list inside `[]`. It will return a new Dataframe object.
- To access rows(s) of a dataframe:    
    - Mention the **positional** row indices as a slice object `[start:stop:step]`, (In this case the positional indices do not match with the actual row indices). It will return a new Dataframe object.
        - `start`: specifies from where the slicing should start, inclusive (default is 0) 
        - `stop`: specifies where it has to stop, exclusive (default is end of the array) 
        - `step`:  is by-default 1
    
    
**Note:** 
- You cannot use two subscript operators `df[2][5]` to access a specific element of a dataframe as in case of numPy arrays.
- You cannot get the subset of a dataframe w.r.t rows and columns at the same time using `df[]`, it can either return a subset of columns only or rows only.
- We will soon see the `df.loc[]` and `df.iloc[]` methods that provides simpler, elegant and powerful way to subset a dataframe as compared to `df[]` syntax.

In [None]:
# Let us read a fresh dataframe and sort it by age column to have a clear understanding about indexing
df = pd.read_csv('datasets/groupdata.csv')
df_sorted = df.sort_values('age')
df_sorted.head()

In [None]:
df_sorted[1:2]

**Example 1:** Select the data under the column `name`. Since the column labels are strings, so we mention it in single quotes.

In [None]:
s1 = df_sorted['name']
print(s1)
type(s1)

> The result is a new Series object. Since this is a series, so you can use many of the methods that you can use on Pandas Series and Dataframes, in a chained format, as shown below

In [None]:
s1.head()
# or
df['name'].head()

**Example 2:** To select multiple columns of a dataframe, pass a list of column names. The result is a new DataFrame object with the selected columns. 

In [None]:
# Just get the first five rows for the columns `roll no`, `gender` and `age`
d1 = df_sorted[['roll no', 'gender', 'age']].head()
d1

**Example 3:** Select the data of a single row at position 1.

In [None]:
df_sorted[1:2:1]

>Point to note that in `df_sorted` dataframe at position 1, we have record of Muhahid having dataframe index 12

**Example 4:** Select the rows from positional index 2 to 3.

In [None]:
df_sorted[2:4]

**Example 5:** Select the rows from positional index 0, 5, 10, and 15

In [None]:
df_sorted[::5]

>Note that the output dataframe contains a subset of original dataframe. However, the index or row labels stays with the rows. It is not renumbered. So this means that every row is identified by a row label, which remain associated wih the row or record until you decide to reset the index

#### Resetting the Index of Subset of a Dataframe
- When we slice data from a datafeame, the row index of resulting dataframe may not be contiguous values.
- You can reset it using the `df.reset_index()` method discussed above as well.

In [None]:
df2 = df_sorted[5:12:2]
df2

In [None]:
df3 = df2.reset_index()
df3

>Note that the index has been reset, however, the old index is now added as a column in the dataframe. Mostly this is not required, so pass the `drop=True` argument to `reset_index()` method to avoid this.

In [None]:
df4 = df2.reset_index(drop=True)
df4

## 3. Selecting Rows and Columns using `iloc` Method
<img align="right" width="400" height="400"  src="images/groupdata.png"  >

- The **`df.iloc[]`** is more powerful than the **`df[]`**, as it allows to filter rows as well as columns of user choice at the same time.
- It is used for filtering rows and selecting columns by **integer position** (0 to n-1) (neither by row index value/label nor by column index value/label). So you cannot mention the column names like `age` rather you need to give its positional index and that is 2.
```
df.iloc[rowstoselect, colstoselect]
```
- You can place a collon in any of the two arguments to select all rows or all columns.
- Another point to keep in mind is that the indices are by position (0 to n-1) and not by actual values of row and column indices. 
- Allowed inputs within `[ , ]` are:
     - A single integer, e.g. ``5`` (note that ``5`` is interpreted as an integer position along the index).
     - A list or array of integers, e.g.  `[9, 2, 7]`.
     - A slice object with integers, e.g. ``2:9``.
     - Note that as with usual Python slices, **stop** index is not included
     - **Note:** ``.iloc`` will raise ``IndexError`` if a requested indexer is out-of-bounds, except *slice* indexers which allow out-of-bounds indexing (this conforms with python/numpy *slice* semantics).

In [None]:
# Let us read a fresh dataframe and sort it by age column to have a clear understanding about indexing
import pandas as pd
df = pd.read_csv('datasets/groupdata.csv')
df_sorted = df.sort_values('age')
df_sorted.head()

### a. Selection of Rows Only

**Example 1:** Select a single row with positional index  2 and all the columns

In [None]:
df_sorted.iloc[2,:]

Note that the integer values are interpreted as row# (positional index) of the dataframe. Moreover a Series object is returned

**Example 2:** Select rows with positional indices 2, 4 and 1 and all the columns

In [None]:
df_sorted.iloc[[2,4,1], :]

**Example 3:** Select rows with positional indices from 3 to 5 (stop value is not inclusive) and all the columns

In [None]:
df_sorted.iloc[3:5, :]

>**In all above examples, if you omit comma and collon for the columns part, Pandas assumes it. I strongly recommend using the above style for clarity of code.**

### b. Selection of Columns Only

In [None]:
df_sorted.columns

**Example 1:** Select all the row values under the column at positional index 3

In [None]:
df_sorted.iloc[:, 3]

**Example 2:** Select all the row values under the column at positional index 1, 4, and 5

In [None]:
df_sorted.iloc[:, [1,4,5]]

**Example 3:** Select all the row values under the columns from position 2 to 4 (Note that the stop index is not inclusive)

In [None]:
df_sorted.iloc[:, 2:5]

### c. Selection of Rows + Columns
```
df.iloc[whatrowsIwant, whatcolumnsIwant]
```
- You can use a single value, a list of multiple values, or a slice object for selecting rows
- You can use a single value, a list of multiple values, or a slice object for selecting columns

**Example 1:** Select only the rows at positional index 3 and 0, and from those two rows select only columns at positional index 1 and 5

In [None]:
df_sorted.iloc[[3, 0], [1, 5]]

**Example 2:** Select only the rows at positional index 0 to 4 (stop index is not inclusive), and from those two rows select only columns at positional index 2 and 3

In [None]:
df_sorted.iloc[0:5, 2:4]

## 4. Selecting Rows and Columns using `df.loc[]` Method
<img align="right" width="400" height="400"  src="images/groupdata.png"  >

- The **`df.loc[]`** is also used for filtering rows and selecting columns but by row index value/label or by column index value/label (NOT by position). 
```
df.loc[rowstoselect, colstoselect]
```
- You can place a collon in any of the two arguments to select all rows or all columns.
- Another point to keep in mind is that the indices are NOT by position, rather by actual values of row and column indices. 
- Allowed inputs within `[ , ]` are:
     - A single label, e.g. `5` or `'a'`, (note that `5` is interpreted as actual index/label **NOT** as an integer position along the index).
     - A list or array of labels, e.g. `[9, 2, 7]` or `['ms07', 'ms02', 'ms08']`.
     - A slice object with labels, e.g. `[3:6:2]` or `['ms05':'ms09']`.
     - **Warning:** Note that contrary to usual Python slices, **both** the start and the stop are included

In [None]:
# Let us read a fresh dataframe and sort it by age column to have a clear understanding about indexing
import pandas as pd
df = pd.read_csv('datasets/groupdata.csv')
df_sorted = df.sort_values('age')
df_sorted.head()

### a. Selection of Rows Only

**Example 1:** Select a single row with row index 2 and all the columns

In [None]:
df_sorted.loc[2,:]

**Example 2:** Select rows with row indices 2, 4 and 1 and all the columns

In [None]:
df_sorted.loc[[2,4,1], :]

**Example 3:** Select rows with row indices from 5 to 3 (stop value is inclusive) and all the columns

In [None]:
df_sorted.loc[5:2, :]

In [None]:
df_sorted.loc[5:3, :]

**Question:** If you give the slice as `3:5`, it will select no row. Can anyone guess why it is so?

In [None]:
df_sorted.loc[3:5:-1, :]

>**In all above examples, if you omit comma and collon for the columns part, Pandas assumes it. I strongly recommend using the above style for clarity of code.**

### b. Selection of Columns Only

In [None]:
df_sorted.columns

**Example 1:** Select all the row values under the column with label `name`

In [None]:
df_sorted.loc[:, 'name']

**Example 2:** Select all the row values under the columns with labels `name`, `address`, and `scholarship`

In [None]:
df_sorted.loc[:, ['name', 'address', 'scholarship']]

**Example 3:** Select all the row values under the columns from a column range given as column labels (Note that the stop index is inclusive)

In [None]:
df_sorted.loc[:, 'address':'name']

In [None]:
df_sorted.loc[:, 'name':'address']

In [None]:
df_sorted.loc[:, 'name':'address':2]

### c. Selection of Rows + Columns
```
df.iloc[whatrowsIwant, whatcolumnsIwant]
```
- You can use a single value, a list of multiple values, or a slice object for selecting rows
- You can use a single value, a list of multiple values, or a slice object for selecting columns

**Example 1:** Select rows with row indices 3 and 0, and from those two rows select only columns at column labels `name` and `address`

In [None]:
df_sorted.loc[[3, 0], ['name', 'address']]

**Example 2:** Select rows with row indices 0 to 5 (Stop index is inclusive), and from those six rows select columns `name`, `age` and `session`

In [None]:
df_sorted.loc[3:13, ['name', 'age', 'session']]

**Question:** You might be expecting `3:13` will return 11 rows, but it has returned only three. Can anyone guess why it is so?

In [None]:
df_sorted.loc[5:8, ['name', 'age', 'session']]

The range `5:8` has returned all the 16 rows in the dataframe :)

# After having understood all of this my recommendation is:

>**Always keep the row indices of your dataframe as 0, 1, 2, 3, 4, ... and the column indices as meaningful labels. If after a slicing or sorting operation, the row indices are a bit disturbed, use `df.reset_index()` method to adjust your row indices match the positional indices.**

## 5. Conditional Selection
- Suppose we want to select only those rows where the age value is greater than 40. Note this time the dataframe has row indices that match with the positional indices.

<img align="left" width="400" height="400"  src="images/groupdata2.png"  >


In [None]:
# Let us read a fresh dataframe and sort it by age column to have a clear understanding about indexing
import pandas as pd
df = pd.read_csv('datasets/groupdata.csv')
df

### a. Option 1:
- Create a Python list having Boolean values of exact same length as the rows of the dataframe 
- The value in the list need to be True for the row which we want to select
- Convert the Python list to a Pandas series
- Finally pass that series to the dataframe

In [None]:
df.age

In [None]:
list1 = []
for length in df.age:
    if length > 40:
        list1.append(True)
    else:
        list1.append(False)
print(list1)

In [None]:
df[list1]

### b. Option 2:
- Instead of creating a Boolean list using the loop, use the condition inside the `df[cond]` operator, that will automatically generate the Boolean list.

In [None]:
df[df['age']>40]

In [None]:
df[df.age >40]

### c. Option 3:
- Best way is to use the **`df.loc[cond]`** method.

In [None]:
df.loc[df.age > 40]

In [None]:
# Using 'loc' gives you the facility to slice the required columns as well
df.loc[df.age > 40, ['name', 'age']]

### d. Conditional Selection based on  Multiple Conditions
- Suppose we want to get all the records of the dataframe where the age value is greater than 40 and belong to Multan
- For this use multiple conditions inside parenthesis and use logical operators (`&`, `|`) in between
```
df[(condition1) op (condition2) op (condition3)]
```

In [None]:
df

In [None]:
df[(df.age < 40) & (df.address != 'Multan')]

In [None]:
df[(df.age > 40) & (df.address == 'Multan')]

In [None]:
# Select records of group A male students only
df1 = df[(df.group == 'group A') & (df.gender == 'Male')]  
df1

In [None]:
# Select the records of students who belong to Sialkot or Karachi
df1 = df[(df.address == 'Sialkot') | (df.address == 'Karachi')]
df1

In [None]:
# Select records who lives outside Karachi and earn a scholarship of greater than 7000, or lives in Peshawer
out = df[(df.address != 'Karachi') & (df.scholarship > 7000) | (df.address == 'Peshawer')]
out


**If there are many conditions connected with or operator, you can simplify it using the `series.isin()` method as shown below:**

In [None]:
df[df.address.isin(['Karachi', 'Peshawer', 'Islamabad'])]

## 6. Selecting columns of a specific data type
- The `df.select_dtypes()` method is used to get a subset of the dataframe to select columns of a specific datatype(s) 
```
df.select_dtypes(include, exclude)
```

- `include` and `exclude` arguments can be scalar or list-like
- Atleast one of these parameters must be supplied

In [None]:
# Let us first check the data types of each column
df.dtypes

In [None]:
# Select the columns with object data type (categorical variables) only`
df.select_dtypes(include='float64')

In [None]:
# Select the columns with int64 datatype
df.select_dtypes(include=['int64', 'float64']).head()

## Practice Exercise no 01
### Introduction:
#### Fictional Army - Filtering and Sorting
This exercise was inspired by this [page](http://chrisalbon.com/python/)
### Step 1. Import the necessary libraries

In [1]:
# import pandas as pd
# import numpy as np

### Step 2. This is the data given as a dictionary


In [None]:
# Create an example dataframe about a fictional army
raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'],
            'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'],
            'deaths': [523, 52, 25, 616, 43, 234, 523, 62, 62, 73, 37, 35],
            'battles': [5, 42, 2, 2, 4, 7, 8, 3, 4, 7, 8, 9],
            'size': [1045, 957, 1099, 1400, 1592, 1006, 987, 849, 973, 1005, 1099, 1523],
            'veterans': [1, 5, 62, 26, 73, 37, 949, 48, 48, 435, 63, 345],
            'readiness': [1, 2, 3, 3, 2, 1, 2, 3, 2, 1, 2, 3],
            'armored': [1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1],
            'deserters': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
            'origin': ['Arizona', 'California', 'Texas', 'Florida', 'Maine', 'Iowa', 'Alaska', 'Washington', 'Oregon', 'Wyoming', 'Louisana', 'Georgia']}

### Step 3. Create a dataframe and assign it to a variable called army. 
##### Don't forget to include the columns names in the order presented in the dictionary ('regiment', 'company', 'deaths'...) so that the column index order is consistent with the solutions. If omitted, pandas will order the columns alphabetically.

In [None]:
# army = pd.DataFrame(raw_data)
# army.head(2)

### Step 4. Set the `origin` column as the index of the dataframe.

In [None]:
# army.set_index('origin',inplace=True)

In [None]:
# army.head(2)

### Step 5. Print only the column `veterans`

In [None]:
# army.veterans

### Step 6. Print the columns 'veterans' and 'deaths'

In [None]:
# army[['veterans','deaths']]

### Step 7. Print the name of all the columns.

In [None]:
# army.columns

### Step 8. Select the `deaths`,`size` and `deserters` columns from `Maine` and `Alaska`.

In [None]:
# army.loc[['Maine','Alaska'],['deaths','size','deserters']]

### Step 9. Select the rows 3 to 7 and the columns 3 to 6

In [None]:
# army.iloc[2:7,2:6]

### Step 10. Select every row after the fourth row and all columns

In [None]:
# army.iloc[4:,:]

### Step 11. Select every row up to the 4th row and all columns

In [None]:
# army.iloc[:4,:]

### Step 12. Select all the rows and the 3rd column up to the 7th column

In [None]:
# army.iloc[:,2:7]

### Step 13. Select rows where df.deaths is greater than 50

In [None]:
# army[army.deaths > 50]
# or 
# army.loc[army.deaths>50]

### Step 14. Select rows where `df.deaths` is greater than 500 or less than 50.

In [None]:
# army[(army.deaths>500) |(army.deaths<50)]
# or
# army.loc[(army.deaths>500) |(army.deaths<50)]

### Step 15. Select all the regiments not named `Dragoons`.

In [None]:
# army[army.regiment !='Dragoons']
# or 
# army.loc[army.regiment != 'Dragoons']

### Step 16. Select the rows called Texas and Arizona

In [None]:
# army.loc[['Texas','Arizona']]

### Step 17. Select the third cell in the row named Arizona

In [None]:
# army.loc[['Arizona']].iloc[:,2]

### Step 18. Select the third cell down in the column named deaths

In [None]:
# army.loc[:,'deaths'].iloc[2]

## Practice Exercise no 02
### Import the necessary libraries

In [None]:
import pandas as pd
import numpy as np

###  Import the dataset from this [address](https://raw.githubusercontent.com/bsef19m521/DatasetsForProjects/master/chipotle.tsv).

### Assign it to a variable called chipo.

In [None]:
# chipo = pd.read_csv('datasets/chipotle.tsv', sep="\t")
# chipo.head()

### Print datatype of each column and try to understand the datatype of each column carefully.

In [None]:
# chipo.info()

**Note: The datatype of `item_price` is object. We can not work with this column until it is a float. So, convert it into float datatype.**

In [None]:
# price = [float(value[1:]) for value in chipo.item_price]
# price[:5]

In [None]:
# chipo.item_price = price
# chipo.dtypes

### delete the duplicates in item_name ,choice_description and quantity using drop_duplicates method


In [None]:
# chipo_filtered = chipo.drop_duplicates(['item_name','quantity','choice_description'])
# chipo_filtered

In [None]:
# chipo_filtered.reset_index(drop=True, inplace=True)

In [None]:
# chipo_filtered

### Select all the products with `quantity ==1` and How many products cost more than `$10.00?`.

In [None]:
# chipo_filtered_product = chipo_filtered[chipo_filtered.quantity==1]


In [None]:
# chipo_filtered_product[chipo_filtered_product.item_price > 10].item_name

In [None]:
# chipo_filtered_product[chipo_filtered_product.item_price > 10].item_name.nunique()

### Step 5. What is the price of each item? 
###### print a data frame with only two columns item_name and item_price

In [None]:
# chipo_filtered[['item_name','item_price']]
# # Here is error , we want only unique item names with their price

In [None]:
# chipo_filtered.groupby('item_name')[['item_price']].sum()

### Step 6. Sort by the original dataframe of the item_name

In [None]:
# chipo.sort_values(by='item_name')

### Step 7. What was the quantity of the most expensive item ordered?

In [None]:
# chipo.sort_values(by='item_price').tail(1)

In [None]:
# chipo.sort_values(by='item_price').tail(1).iloc[:,1]

### Step 8. How many times was a Veggie Salad Bowl ordered?

In [None]:
# chipo[chipo.item_name == "Veggie Salad Bowl"]

In [None]:
# len(chipo[chipo.item_name == "Veggie Salad Bowl"])

### Step 9. How many times did someone order more than one Canned Soda?

In [None]:
# chipo[(chipo.item_name =='Canned Soda') & (chipo.quantity >1 )]

In [None]:
# len(chipo[(chipo.item_name =='Canned Soda') & (chipo.quantity >1 )])

## Check Your Concepts:
- What is Pandas?
- How to iterate over rows in Pandas Dataframe 
- Different ways to iterate over rows in Pandas Dataframe 
- Selecting rows in pandas DataFrame based on conditions 
- Select any row from a Dataframe using iloc[] and iat[] in Pandas 
- Limited rows selection with given column in Pandas | Python 
- Drop rows from the dataframe based on certain condition applied on a column 
- Insert row at given position in Pandas Dataframe 
- Create a list from rows in Pandas dataframe 
- Create a list from rows in Pandas DataFrame | Set 2 
- Ranking Rows of Pandas DataFrame 
- Sorting rows in pandas DataFrame 
- Select row with maximum and minimum value in Pandas dataframe 
- Get all rows in a Pandas DataFrame containing given substring 
- Convert a column to row name/index in Pandas 
- How to randomly select rows from Pandas DataFrame 
 

# Pandas - Assignment no 05
- Here is link of [Pandas - Assignment no 05]()