## Exercise 5.4
### 1. loc and iloc for rows

With [`loc`](https://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing) and [`iloc`](https://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing) you can do practically any data selection operation on DataFrames you can think of. [`loc`](https://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing) is label-based, which means that you have to specify rows and columns based on their row and column labels. [`iloc`](https://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing) is integer index based, so you have to specify rows and columns by their integer index like you did in the previous exercise.
####  Instructions (4 points)
- print out the information about the data frame using `info()` method
- print out the index of the data frame
- Use [`iloc`](https://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing) to select the observation corresponding to Japan as a Series. The label of this row is `JPN`, the index is `2`. Make sure to print the resulting Series.
- Use [`loc`](https://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing) to select the observations for Australia (labeled `AUS`) and Egypt (labeled `EG`) as a DataFrame. Make sure to print the resulting DataFrame.

In [2]:
# Import cars data
import pandas as pd
cars = pd.read_csv('https://github.com/huangpen77/BUDT704/raw/main/Chapter06/cars.csv', index_col = 0)
print(cars.info(), '\n')
print(cars.index, '\n')

# Print out observation for Japan
print(cars.iloc[2], '\n')

# Print out observations for Australia and Egypt
print(cars.loc[['AUS', 'EG']])

<class 'pandas.core.frame.DataFrame'>
Index: 7 entries, US to EG
Data columns (total 3 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   cars_per_cap  7 non-null      int64 
 1   country       7 non-null      object
 2   drives_right  7 non-null      bool  
dtypes: bool(1), int64(1), object(1)
memory usage: 175.0+ bytes
None 

Index(['US', 'AUS', 'JAP', 'IN', 'RU', 'MOR', 'EG'], dtype='object') 

cars_per_cap      588
country         Japan
drives_right    False
Name: JAP, dtype: object 

     cars_per_cap    country  drives_right
AUS           731  Australia         False
EG             45      Egypt          True


### 2. loc and iloc for rows and columns
[`loc`](https://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing) and [`iloc`](https://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing) also allow you to select both rows and columns from a DataFrame. 

####  Instructions (2 points)

- Print out the `drives_right` value of the row corresponding to Morocco (its row label is `MOR`) using [`iloc`](https://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing) 
- Print out a sub-DataFrame, containing the observations for Russia and Morocco and the columns `country` and `drives_right` using [`loc`](https://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing).

In [3]:
# Print out drives_right value of Morocco
print(cars.iloc[[5], [2]], '\n')

# Print sub-DataFrame
print(cars.loc['RU':'MOR', 'country':'drives_right'])

     drives_right
MOR          True 

     country  drives_right
RU    Russia          True
MOR  Morocco          True


### 3. loc and iloc for columns
It's also possible to select only columns with [`loc`](https://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing) and [`iloc`](https://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing). In both cases, you simply put a slice going from beginning to end in front of the comma:

```python
cars.loc[:, ['country','drives_right']]
cars.iloc[:, [1, 2]]
```

####  Instructions (3 points)

- Print out the `drives_right` column as a Series using [`iloc`](https://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing).
- Print out the `drives_right` column as a DataFrame using [`iloc`](https://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing).
- Print out both the `cars_per_cap` and `drives_right` column as a DataFrame using [`loc`](https://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing).

In [4]:
# Print out drives_right column as Series
print(cars.iloc[:, 2], '\n')

# Print out drives_right column as DataFrame
print(cars.iloc[:,[2]], '\n')

# Print out cars_per_cap and drives_right as DataFrame
print(cars.loc[:, ['cars_per_cap', 'drives_right']])

US      True
AUS    False
JAP    False
IN     False
RU      True
MOR     True
EG      True
Name: drives_right, dtype: bool 

     drives_right
US           True
AUS         False
JAP         False
IN          False
RU           True
MOR          True
EG           True 

     cars_per_cap  drives_right
US            809          True
AUS           731         False
JAP           588         False
IN             18         False
RU            200          True
MOR            70          True
EG             45          True


## Exercise 5.5
### 1. Dictionary to DataFrame (1)

The DataFrame is one of Pandas' most important data structures. It's basically a way to store tabular data where you can label the rows and the columns. One way to build a DataFrame is from a dictionary.

In the exercises that follow you will be working with vehicle data from different countries. Each observation corresponds to a country and the columns give information about the number of vehicles per capita, whether people drive left or right, and so on.

Three lists are defined in the script:

- `names`, containing the country names for which data is available.
- `dr`, a list with booleans that tells whether people drive left or right in the corresponding country.
- `cpc`, the number of motor vehicles per 1000 people in the corresponding country.

#### Instructions (2 points)

- Use the pre-defined lists to create a dictionary called `my_dict`. There should be three key value pairs:
    - key `'country'` and value `names`.
    - key `'drives_right'` and value `dr`.
    - key `'cars_per_cap'` and value `cpc`.
- Use `pd.DataFrame()` to turn your dict into a DataFrame called `cars`.

In [27]:
# Pre-defined lists
names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt']
dr =  [True, False, False, False, True, True, True]
cpc = [809, 731, 588, 18, 200, 70, 45]

# Import pandas
import pandas as pd

# Create dictionary my_dict with three key:value pairs: my_dict
my_dict = {'country':names, 'drives_right': dr, 'cars_per_cap': cpc}

# Build a DataFrame cars from my_dict: cars
cars = pd.DataFrame(my_dict)

# Print cars
print(cars)

         country  drives_right  cars_per_cap
0  United States          True           809
1      Australia         False           731
2          Japan         False           588
3          India         False            18
4         Russia          True           200
5        Morocco          True            70
6          Egypt          True            45


#### Instructions (1 point)

- Specify the row labels by setting `cars.index` equal to `row_labels`.

In [28]:
# Definition of row_labels
row_labels = ['US', 'AUS', 'JPN', 'IN', 'RU', 'MOR', 'EG']

# Specify row labels of cars
cars.index = row_labels

# Print cars again
print(cars)

           country  drives_right  cars_per_cap
US   United States          True           809
AUS      Australia         False           731
JPN          Japan         False           588
IN           India         False            18
RU          Russia          True           200
MOR        Morocco          True            70
EG           Egypt          True            45


### 2. Dictionary to DataFrame (2)
In this exercise, you will create a dataframe from a dictionary where the data is entered row by row.

#### Instructions (4 points)
- use predefined dict `europe` to create a dataframe, where each entry in the dict is a row.
- make a list consists of strings `'capital'` and `'population'`, and assign the list to the column labels of the dataframe.

In [29]:
europe = { 'spain': ['madrid', 46.77 ],
           'france': ['paris', 66.03 ],
           'germany': ['berlin', 80.62],
           'norway': ['oslo', 5.084 ] }
# use predefined dict 'europe' to create a dataframe, where each entry in the dict is a row.
europe_df = pd.DataFrame.from_dict(europe, orient='index')
europe_df

Unnamed: 0,0,1
spain,madrid,46.77
france,paris,66.03
germany,berlin,80.62
norway,oslo,5.084


In [30]:
# make a list consists of strings `'capital'` and `'population'`, and assign the list to the column labels of the dataframe.
cols = ['capital', 'population']
europe_df.columns = cols
europe_df

Unnamed: 0,capital,population
spain,madrid,46.77
france,paris,66.03
germany,berlin,80.62
norway,oslo,5.084


### 3. DataFrame from a list of tuples

#### Instructions (2 points)
- create a data frame `students` from a list of tuples, using `Name`, `Age` and `Score` as the columns. Use `DataFrame.from_records()` method.

In [35]:
data = [('Peter', 18, 7),
        ('Riff', 15, 6),
        ('John', 17, 8),
        ('Michel', 18, 7),
        ('Sheli', 17, 5) ]

# create DataFrame using data
students = pd.DataFrame.from_records(data, columns=['Name','Age','Score'])
  
print(students) 

     Name  Age  Score
0   Peter   18      7
1    Riff   15      6
2    John   17      8
3  Michel   18      7
4   Sheli   17      5


### 4. Deleting Data
#### Instructions (2 points)
- drop students with indices 2 and 4 from `students` data frame
- drop column `Score` from `students` data frame

In [37]:
students.drop([2,4])
#only the remaining entries will remain 

Unnamed: 0,Name,Age,Score
0,Peter,18,7
1,Riff,15,6
3,Michel,18,7


In [38]:
students.drop('Score', axis=1)

Unnamed: 0,Name,Age
0,Peter,18
1,Riff,15
2,John,17
3,Michel,18
4,Sheli,17
