# Exercise 6: Reading Tabular Data into DataFrames

## Aim: Learn what DataFrames are and practice using them.

### Issues covered:
- Importing the `pandas` library
- Using `pandas` to load a simple CSV data set
- Get information about the DataFrames we make

## 1. Let's import `pandas` and make some DataFrames.

Import `pandas`, then create a dataframe using the `data/weather.csv` file and print it out.

In [4]:
import pandas as pd
weather_data = pd.read_csv('../data/weather.csv')
weather_data

Unnamed: 0,Date,Time,Temp,Rainfall
0,2014-01-01,00:00,2.34,4.45
1,2014-01-01,12:00,6.7,8.34
2,2014-01-02,00:00,-1.34,10.25


Create a new dataframe which indexes by `Date` and print it.

In [5]:
weather_data = pd.read_csv('../data/weather.csv',index_col = 0)
weather_data

Unnamed: 0_level_0,Time,Temp,Rainfall
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2014-01-01,00:00,2.34,4.45
2014-01-01,12:00,6.7,8.34
2014-01-02,00:00,-1.34,10.25


## 2. Let's practice using some dataframe methods.

What is the memory usage of the dataframe in bytes?

In [6]:
weather_data.info() 

<class 'pandas.core.frame.DataFrame'>
Index: 3 entries, 2014-01-01 to 2014-01-02
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Time      3 non-null      object 
 1   Temp      3 non-null      float64
 2   Rainfall  3 non-null      float64
dtypes: float64(2), object(1)
memory usage: 96.0+ bytes


Weather_data is 96.0+ Bytes

What command can you use to find the dataframe's column names?

In [13]:
list(weather_data)

['Time', 'Temp', 'Rainfall']

Swap the rows and columns and `print` the result.

In [14]:
weather_data.T

Date,2014-01-01,2014-01-01.1,2014-01-02
Time,00:00,12:00,00:00
Temp,2.34,6.7,-1.34
Rainfall,4.45,8.34,10.25


Find the mean and standard deviation of the weather data.

In [18]:
stats = weather_data.describe()
print(stats)

           Temp   Rainfall
count  3.000000   3.000000
mean   2.566667   7.680000
std    4.024790   2.955791
min   -1.340000   4.450000
25%    0.500000   6.395000
50%    2.340000   8.340000
75%    4.520000   9.295000
max    6.700000  10.250000


## 3. Extension: Some Dataframe Challenges

1. Find the first three rows of data in `data/americas_gdp.csv` using `head()`.

In [24]:
america_data = pd.read_csv("../data/americas_gdp.csv")
america_data.head()

Unnamed: 0,continent,country,gdpPercap_1952,gdpPercap_1957,gdpPercap_1962,gdpPercap_1967,gdpPercap_1972,gdpPercap_1977,gdpPercap_1982,gdpPercap_1987,gdpPercap_1992,gdpPercap_1997,gdpPercap_2002,gdpPercap_2007
0,Americas,Argentina,5911.315053,6856.856212,7133.166023,8052.953021,9443.038526,10079.02674,8997.897412,9139.671389,9308.41871,10967.28195,8797.640716,12779.37964
1,Americas,Bolivia,2677.326347,2127.686326,2180.972546,2586.886053,2980.331339,3548.097832,3156.510452,2753.69149,2961.699694,3326.143191,3413.26269,3822.137084
2,Americas,Brazil,2108.944355,2487.365989,3336.585802,3429.864357,4985.711467,6660.118654,7030.835878,7807.095818,6950.283021,7957.980824,8131.212843,9065.800825
3,Americas,Canada,11367.16112,12489.95006,13462.48555,16076.58803,18970.57086,22090.88306,22898.79214,26626.51503,26342.88426,28954.92589,33328.96507,36319.23501
4,Americas,Chile,3939.978789,4315.622723,4519.094331,5106.654313,5494.024437,4756.763836,5095.665738,5547.063754,7596.125964,10118.05318,10778.78385,13171.63885


2. Find the last 3 **columns** of data.

_Hint: You may need to change your view of the data then you can use `tail()`._

In [25]:
america_data.tail()

Unnamed: 0,continent,country,gdpPercap_1952,gdpPercap_1957,gdpPercap_1962,gdpPercap_1967,gdpPercap_1972,gdpPercap_1977,gdpPercap_1982,gdpPercap_1987,gdpPercap_1992,gdpPercap_1997,gdpPercap_2002,gdpPercap_2007
20,Americas,Puerto Rico,3081.959785,3907.156189,5108.34463,6929.277714,9123.041742,9770.524921,10330.98915,12281.34191,14641.58711,16999.4333,18855.60618,19328.70901
21,Americas,Trinidad and Tobago,3023.271928,4100.3934,4997.523971,5621.368472,6619.551419,7899.554209,9119.528607,7388.597823,7370.990932,8792.573126,11460.60023,18008.50924
22,Americas,United States,13990.48208,14847.12712,16173.14586,19530.36557,21806.03594,24072.63213,25009.55914,29884.35041,32003.93224,35767.43303,39097.09955,42951.65309
23,Americas,Uruguay,5716.766744,6150.772969,5603.357717,5444.61962,5703.408898,6504.339663,6920.223051,7452.398969,8137.004775,9230.240708,7727.002004,10611.46299
24,Americas,Venezuela,7689.799761,9802.466526,8422.974165,9541.474188,10505.25966,13143.95095,11152.41011,9883.584648,10733.92631,10165.49518,8605.047831,11415.80569


3. Use `help(data_americas.to_csv)` to figure out how writing to a CSV file works.

In [26]:
help(america_data.to_csv)

Help on method to_csv in module pandas.core.generic:

to_csv(path_or_buf: 'FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None' = None, *, sep: 'str' = ',', na_rep: 'str' = '', float_format: 'str | Callable | None' = None, columns: 'Sequence[Hashable] | None' = None, header: 'bool_t | list[str]' = True, index: 'bool_t' = True, index_label: 'IndexLabel | None' = None, mode: 'str' = 'w', encoding: 'str | None' = None, compression: 'CompressionOptions' = 'infer', quoting: 'int | None' = None, quotechar: 'str' = '"', lineterminator: 'str | None' = None, chunksize: 'int | None' = None, date_format: 'str | None' = None, doublequote: 'bool_t' = True, escapechar: 'str | None' = None, decimal: 'str' = '.', errors: 'OpenFileErrors' = 'strict', storage_options: 'StorageOptions | None' = None) -> 'str | None' method of pandas.core.frame.DataFrame instance
    Write object to a comma-separated values (csv) file.
    
    Parameters
    ----------
    path_or_buf : str, path object, file-like ob

4. Try writing to a CSV file using the code below (giving your own filename). Take a look in the data folder and check it's there.
```
data_americas.to_csv('data/new_file_name.csv')
```

In [30]:
america_data.to_csv('../data/new_file_name.csv')