# Basic operations

After reading the contents of a file into your Pandas DataFrame, it's important to examine your data for a couple of reasons: 
- Assure that you've correctly loaded the data. 
- See what kind of data you have. 
- Check the validity of your dataset. 

I'll go through a couple ways we can do this. 

**Viewing the first and last 5 rows**

So one of the first things you do after loading your data is look at the head and the tail of your dataset. 

The method ``head`` selects the top N number of records from your dataset. 

In [2]:
# Load car loan data from a scv file
# Import libraries
import pandas as pd
df = pd.read_csv('car_financing.csv')


In [3]:
# Select top N number of records (default = 5)
df.head()

Unnamed: 0,Month,Starting Balance,Repayment,Interest Paid,Principal Paid,New Balance,term,interest_rate,car_type
0,1,34689.96,687.23,202.93,484.3,34205.66,60,0.0702,Toyota Sienna
1,2,34205.66,687.23,200.1,487.13,33718.53,60,0.0702,Toyota Sienna
2,3,33718.53,687.23,197.25,489.98,33228.55,60,0.0702,Toyota Sienna
3,4,33228.55,687.23,194.38,492.85,32735.7,60,0.0702,Toyota Sienna
4,5,32735.7,687.23,191.5,495.73,32239.97,60,0.0702,Toyota Sienna


The method ``tail`` selects the bottom N number of records from your dataset. This is really important to do as oftentimes your data format could change throughout your dataset. 

In [4]:
# Select bottom N number of records (default = 5)
df.tail()

Unnamed: 0,Month,Starting Balance,Repayment,Interest Paid,Principal Paid,New Balance,term,interest_rate,car_type
403,56,3951.11,796.01,9.54,786.47,3164.64,60,0.029,VW Golf R
404,57,3164.64,796.01,7.64,788.37,2376.27,60,0.029,VW Golf R
405,58,2376.27,796.01,5.74,790.27,1586.0,60,0.029,VW Golf R
406,59,1586.0,796.01,3.83,792.18,793.82,60,0.029,VW Golf R
407,60,793.82,796.01,1.91,794.1,-0.28,60,0.029,VW Golf R


**Check the column data types**

Another important thing to do is to check your column data types. You can do this by using the ``dtypes attribute``. One thing we'll notice is that certain columns are ints, certain columns are floats, whereas others can be objects and you can think of objects as strings. 

In [5]:
# Check the column data types using the dtypes attribute
# For example, you can wrongly assume the values in one of your columns is 
# a int64 instead of a string. 

df.dtypes

Month                 int64
Starting Balance    float64
Repayment           float64
Interest Paid       float64
Principal Paid      float64
New Balance         float64
term                  int64
interest_rate       float64
car_type             object
dtype: object

Another important thing to do is to find out how many rows and columns you have in your dataset. To do this, you can use the shape attribute, and you see that we have 408 rows and 9 columns. 

In [6]:
# Use the shape attribute to get the number of rows and columns in your dataframe
df.shape

(408, 9)

A really important thing to do that is often forgotten is to use the info method. And the reason why this is very valuable is that you can see how many non-null values you have in your dataset as oftentimes, data analysis tasks and data visualizations will not work if you have null values in your dataset. 

In [7]:
# The info method gives the column datatypes + number of non-null values
# Notice that we seem to have 408 non-null values for all but the Interest Paid column. 
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 408 entries, 0 to 407
Data columns (total 9 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Month             408 non-null    int64  
 1   Starting Balance  408 non-null    float64
 2   Repayment         408 non-null    float64
 3   Interest Paid     408 non-null    float64
 4   Principal Paid    408 non-null    float64
 5   New Balance       408 non-null    float64
 6   term              408 non-null    int64  
 7   interest_rate     408 non-null    float64
 8   car_type          408 non-null    object 
dtypes: float64(6), int64(2), object(1)
memory usage: 28.8+ KB


As you can see in the dataset over here for the interest paid column, we have one null value, because we have 407 non-null values versus every other column has 408. What this tells me is that I'll either have to remove the row or fill in the missing data with some sort of amputation technique. In the end, it is really important to remember to verify your data. Use the techniques I just showed you to make sure that everything looks good.