![pandas.png](media/pandas.png)

---

## Python Pandas For Absolute Beginners






### What is Pandas?
- Pandas is a Python library used in data science and data analytics.
- It has functions and methods that are used for exploratory analysis and data manipulations.




### Why Learn and Use Pandas?
1. Pandas allows Data Scientists to import, analyze and explore data.
2. Pandas is used for data pre-processing, especially in data cleaning.
3. Pandas provides Data Scientists with some statistical inferences on data.
4. Pandas is easy to learn 




[Pandas Official Website](https://pandas.pydata.org/)

### Install Pandas


` pip install pandas `




In [1]:
# Importing Pandas
import pandas as pd

In [2]:
# Check version of Pandas
print(pd.__version__)


1.2.2


# Series and DataFrame

## What is a Series?
- A Pandas Series is a 1-D array holding data of any type.

- It is like a column in a table or matrix

## What is a DataFrame? 

- When Dataset is multi-dimensional, they are stored in a structure called DataFrames.

- If a Series is like a column, then, the DataFrame is the whole table

![SeriesPandas.png](media/SeriesPandas.png)

## Series


In [3]:
# Create Pandas Series from a Python List
data = [1, 2, 3, 4]

In [5]:
type(data)

list

In [13]:
x = pd.Series(data, index=['Mon', 'Tue', 'Wed', 'Thu'])

In [14]:
x

Mon    1
Tue    2
Wed    3
Thu    4
dtype: int64

In [8]:
type(x)

pandas.core.series.Series

In [2]:
# Create Pandas Series with Index


In [15]:
# Create Pandas Series from a Python Dictionary
data = {
    'name':'Kenneth',
    'age': 30,
    'email': 'kbroni@gmail.com'
}

In [16]:
data

{'name': 'Kenneth', 'age': 30, 'email': 'kbroni@gmail.com'}

In [17]:
type(data)

dict

In [18]:
y = pd.Series(data)

In [19]:
y

name              Kenneth
age                    30
email    kbroni@gmail.com
dtype: object

##  DataFrame



In [20]:
# Create Pandas DataFrame
data = [1, 2, 3]


In [21]:
data

[1, 2, 3]

In [22]:
z = pd.DataFrame(data)

In [23]:
z

Unnamed: 0,0
0,1
1,2
2,3


In [24]:
type(z)

pandas.core.frame.DataFrame

In [25]:
data = [[10, 20, 30], [40, 50, 60]]

In [30]:
z = pd.DataFrame(data, columns=['Jan', 'Feb', 'Mar'], index=['Week1', 'Week2'])

In [31]:
z

Unnamed: 0,Jan,Feb,Mar
Week1,10,20,30
Week2,40,50,60


In [32]:
# Passing in data using loc
z

Unnamed: 0,Jan,Feb,Mar
Week1,10,20,30
Week2,40,50,60


In [33]:
z = pd.DataFrame(data, columns=['Jan', 'Feb', 'Mar'])

In [34]:
z

Unnamed: 0,Jan,Feb,Mar
0,10,20,30
1,40,50,60


In [35]:
z.loc[2] = [100, 200, 300]

In [36]:
z

Unnamed: 0,Jan,Feb,Mar
0,10,20,30
1,40,50,60
2,100,200,300


In [37]:
# Create Pandas DataFrame from a Python Dict
data = {
    'Monday': [10, 20, 30],
    'Tuesday': [100, 200, 300],
    'Wednesday': [90, 80, 70]
}

In [38]:
data

{'Monday': [10, 20, 30], 'Tuesday': [100, 200, 300], 'Wednesday': [90, 80, 70]}

In [39]:
y = pd.DataFrame(data)

In [40]:
y

Unnamed: 0,Monday,Tuesday,Wednesday
0,10,100,90
1,20,200,80
2,30,300,70


In [45]:
# Using the loc attribute to return one or more specified row(s)
y.loc[[0,2]]

Unnamed: 0,Monday,Tuesday,Wednesday
0,10,100,90
2,30,300,70


In [54]:
y[['Tuesday', 'Wednesday']]

Unnamed: 0,Tuesday,Wednesday
0,100,90
1,200,80
2,300,70


###  Dataset and Data Sources
- Kaggle
- UCI Machine Learning Repository
- Experimental trials




#### Iris Dataset
[Iris Dataset from kaggle](https://www.kaggle.com/uciml/iris?select=Iris.csv) 

[Iris Dataset from UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/iris)

![alt text](media/iris.png)

In [59]:
# Import from CSV file
dataset = pd.read_csv('Iris.csv')

In [61]:
# View Data
type(dataset)

pandas.core.frame.DataFrame

In [62]:
dataset

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa
...,...,...,...,...,...,...
145,146,6.7,3.0,5.2,2.3,Iris-virginica
146,147,6.3,2.5,5.0,1.9,Iris-virginica
147,148,6.5,3.0,5.2,2.0,Iris-virginica
148,149,6.2,3.4,5.4,2.3,Iris-virginica


In [74]:
# Check Head --- Returns the first 5 rows of the DataFrame
dataset.head(10)

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa
5,6,5.4,3.9,1.7,0.4,Iris-setosa
6,7,4.6,3.4,1.4,0.3,Iris-setosa
7,8,5.0,3.4,1.5,0.2,Iris-setosa
8,9,4.4,2.9,1.4,0.2,Iris-setosa
9,10,4.9,3.1,1.5,0.1,Iris-setosa


In [76]:
# Check Tail --- Returns the last 5 rows of the DataFrame
dataset.tail(10)

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
140,141,6.7,3.1,5.6,2.4,Iris-virginica
141,142,6.9,3.1,5.1,2.3,Iris-virginica
142,143,5.8,2.7,5.1,1.9,Iris-virginica
143,144,6.8,3.2,5.9,2.3,Iris-virginica
144,145,6.7,3.3,5.7,2.5,Iris-virginica
145,146,6.7,3.0,5.2,2.3,Iris-virginica
146,147,6.3,2.5,5.0,1.9,Iris-virginica
147,148,6.5,3.0,5.2,2.0,Iris-virginica
148,149,6.2,3.4,5.4,2.3,Iris-virginica
149,150,5.9,3.0,5.1,1.8,Iris-virginica


In [78]:
# Check Shape --- Returns a tupple showing the number of rows and columns
type(dataset.shape)

tuple

In [79]:
dataset.shape

(150, 6)

In [80]:
# Check Info --- Returns basic information on the DataFrame
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             150 non-null    int64  
 1   SepalLengthCm  150 non-null    float64
 2   SepalWidthCm   150 non-null    float64
 3   PetalLengthCm  150 non-null    float64
 4   PetalWidthCm   150 non-null    float64
 5   Species        150 non-null    object 
dtypes: float64(4), int64(1), object(1)
memory usage: 7.2+ KB


## Pandas - Data Pre-Processing and Cleaning


Data cleaning means fixing errors or bad data in your data set. This is a pre-processing activity that needs to be carried out before using the dataset

Bad dataset could be a combination of:

- Empty cells or null values
- Data in wrong format
- Wrong data
- Duplicates


In [81]:
data = pd.read_csv('Iris_modified.csv')

In [82]:
data

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,,Iris-setosa
1,2,4.9,3.0,1.4,,Iris-setosa
2,2,4.9,3.0,1.4,,Iris-setosa
3,3,4.7,3.2,,0.2,Iris-setosa
4,4,4.6,3.1,1.5,0.2,Iris-setosa
...,...,...,...,...,...,...
149,146,6.7,3.0,5.2,2.3,Iris-virginica
150,147,6.3,,5.0,1.9,Iris-virginica
151,148,6.5,3.0,5.2,2.0,Iris-virginica
152,149,6.2,3.4,5.4,2.3,Iris-virginica


In [83]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 154 entries, 0 to 153
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             154 non-null    int64  
 1   SepalLengthCm  153 non-null    float64
 2   SepalWidthCm   150 non-null    float64
 3   PetalLengthCm  152 non-null    float64
 4   PetalWidthCm   147 non-null    float64
 5   Species        154 non-null    object 
dtypes: float64(4), int64(1), object(1)
memory usage: 7.3+ KB


In [84]:
# Remove Null Values -- dropna()
data

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,,Iris-setosa
1,2,4.9,3.0,1.4,,Iris-setosa
2,2,4.9,3.0,1.4,,Iris-setosa
3,3,4.7,3.2,,0.2,Iris-setosa
4,4,4.6,3.1,1.5,0.2,Iris-setosa
...,...,...,...,...,...,...
149,146,6.7,3.0,5.2,2.3,Iris-virginica
150,147,6.3,,5.0,1.9,Iris-virginica
151,148,6.5,3.0,5.2,2.0,Iris-virginica
152,149,6.2,3.4,5.4,2.3,Iris-virginica


In [93]:
data

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
4,4,4.6,3.1,1.5,0.2,Iris-setosa
5,5,5.0,3.6,1.4,0.2,Iris-setosa
6,6,5.4,3.9,1.7,0.4,Iris-setosa
8,8,5.0,3.4,1.5,0.2,Iris-setosa
9,9,4.4,2.9,1.4,0.2,Iris-setosa
...,...,...,...,...,...,...
147,144,6.8,3.2,5.9,2.3,Iris-virginica
148,145,6.7,3.3,5.7,2.5,Iris-virginica
149,146,6.7,3.0,5.2,2.3,Iris-virginica
151,148,6.5,3.0,5.2,2.0,Iris-virginica


In [86]:
data

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,,Iris-setosa
1,2,4.9,3.0,1.4,,Iris-setosa
2,2,4.9,3.0,1.4,,Iris-setosa
3,3,4.7,3.2,,0.2,Iris-setosa
4,4,4.6,3.1,1.5,0.2,Iris-setosa
...,...,...,...,...,...,...
149,146,6.7,3.0,5.2,2.3,Iris-virginica
150,147,6.3,,5.0,1.9,Iris-virginica
151,148,6.5,3.0,5.2,2.0,Iris-virginica
152,149,6.2,3.4,5.4,2.3,Iris-virginica


In [94]:
data = pd.read_csv('Iris_modified.csv')

In [96]:
x = data.dropna()

In [97]:
x

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
4,4,4.6,3.1,1.5,0.2,Iris-setosa
5,5,5.0,3.6,1.4,0.2,Iris-setosa
6,6,5.4,3.9,1.7,0.4,Iris-setosa
8,8,5.0,3.4,1.5,0.2,Iris-setosa
9,9,4.4,2.9,1.4,0.2,Iris-setosa
...,...,...,...,...,...,...
147,144,6.8,3.2,5.9,2.3,Iris-virginica
148,145,6.7,3.3,5.7,2.5,Iris-virginica
149,146,6.7,3.0,5.2,2.3,Iris-virginica
151,148,6.5,3.0,5.2,2.0,Iris-virginica


In [98]:
data

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,,Iris-setosa
1,2,4.9,3.0,1.4,,Iris-setosa
2,2,4.9,3.0,1.4,,Iris-setosa
3,3,4.7,3.2,,0.2,Iris-setosa
4,4,4.6,3.1,1.5,0.2,Iris-setosa
...,...,...,...,...,...,...
149,146,6.7,3.0,5.2,2.3,Iris-virginica
150,147,6.3,,5.0,1.9,Iris-virginica
151,148,6.5,3.0,5.2,2.0,Iris-virginica
152,149,6.2,3.4,5.4,2.3,Iris-virginica


In [99]:
# Replace Null Values -- fillna()
data

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,,Iris-setosa
1,2,4.9,3.0,1.4,,Iris-setosa
2,2,4.9,3.0,1.4,,Iris-setosa
3,3,4.7,3.2,,0.2,Iris-setosa
4,4,4.6,3.1,1.5,0.2,Iris-setosa
...,...,...,...,...,...,...
149,146,6.7,3.0,5.2,2.3,Iris-virginica
150,147,6.3,,5.0,1.9,Iris-virginica
151,148,6.5,3.0,5.2,2.0,Iris-virginica
152,149,6.2,3.4,5.4,2.3,Iris-virginica


In [100]:
y = data.fillna(200)

In [101]:
y

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,200.0,Iris-setosa
1,2,4.9,3.0,1.4,200.0,Iris-setosa
2,2,4.9,3.0,1.4,200.0,Iris-setosa
3,3,4.7,3.2,200.0,0.2,Iris-setosa
4,4,4.6,3.1,1.5,0.2,Iris-setosa
...,...,...,...,...,...,...
149,146,6.7,3.0,5.2,2.3,Iris-virginica
150,147,6.3,200.0,5.0,1.9,Iris-virginica
151,148,6.5,3.0,5.2,2.0,Iris-virginica
152,149,6.2,3.4,5.4,2.3,Iris-virginica


In [102]:
y.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 154 entries, 0 to 153
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             154 non-null    int64  
 1   SepalLengthCm  154 non-null    float64
 2   SepalWidthCm   154 non-null    float64
 3   PetalLengthCm  154 non-null    float64
 4   PetalWidthCm   154 non-null    float64
 5   Species        154 non-null    object 
dtypes: float64(4), int64(1), object(1)
memory usage: 7.3+ KB


In [128]:
# Replace Null Values for Specific Columns-- fillna()
data

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,,Iris-setosa
1,2,4.9,3.0,1.4,,Iris-setosa
2,2,4.9,3.0,1.4,,Iris-setosa
3,3,4.7,3.2,,0.2,Iris-setosa
4,4,4.6,3.1,1.5,0.2,Iris-setosa
...,...,...,...,...,...,...
149,146,6.7,3.0,5.2,2.3,Iris-virginica
150,147,6.3,,5.0,1.9,Iris-virginica
151,148,6.5,3.0,5.2,2.0,Iris-virginica
152,149,6.2,3.4,5.4,2.3,Iris-virginica


In [133]:
data['PetalWidthCm'] = data['PetalWidthCm'].fillna(800)
data['PetalLengthCm'] = data['PetalLengthCm'].fillna(100)

In [134]:
data

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,800.0,Iris-setosa
1,2,4.9,3.0,1.4,800.0,Iris-setosa
2,2,4.9,3.0,1.4,800.0,Iris-setosa
3,3,4.7,3.2,100.0,0.2,Iris-setosa
4,4,4.6,3.1,1.5,0.2,Iris-setosa
...,...,...,...,...,...,...
149,146,6.7,3.0,5.2,2.3,Iris-virginica
150,147,6.3,,5.0,1.9,Iris-virginica
151,148,6.5,3.0,5.2,2.0,Iris-virginica
152,149,6.2,3.4,5.4,2.3,Iris-virginica


In [136]:
# Replace Null Values Using Mean, Median or Mode -- fillna()
data = pd.read_csv('Iris_modified.csv')

In [137]:
data

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,,Iris-setosa
1,2,4.9,3.0,1.4,,Iris-setosa
2,2,4.9,3.0,1.4,,Iris-setosa
3,3,4.7,3.2,,0.2,Iris-setosa
4,4,4.6,3.1,1.5,0.2,Iris-setosa
...,...,...,...,...,...,...
149,146,6.7,3.0,5.2,2.3,Iris-virginica
150,147,6.3,,5.0,1.9,Iris-virginica
151,148,6.5,3.0,5.2,2.0,Iris-virginica
152,149,6.2,3.4,5.4,2.3,Iris-virginica


In [138]:
data.PetalLengthCm

0      1.4
1      1.4
2      1.4
3      NaN
4      1.5
      ... 
149    5.2
150    5.0
151    5.2
152    5.4
153    NaN
Name: PetalLengthCm, Length: 154, dtype: float64

In [141]:
mean_SL = data.SepalLengthCm.mean()
mean_SW = data.SepalWidthCm.mean()
mean_PL = data.PetalLengthCm.mean()
mean_PW = data.PetalWidthCm.mean()

In [142]:
print(f'SepalLengthCm: {mean_SL}')
print(f'SepalWidthCm: {mean_SW}')
print(f'PetalLengthCm: {mean_PL}')
print(f'PetalWidthCm: {mean_PW}')

SepalLengthCm: 5.827450980392157
SepalWidthCm: 3.0433333333333334
PetalLengthCm: 3.746052631578947
PetalWidthCm: 1.195918367346939


In [144]:
data['SepalLengthCm'] = data['SepalLengthCm'].fillna(mean_SL)
data['SepalWidthCm'] = data['SepalWidthCm'].fillna(mean_SW)
data['PetalLengthCm'] = data['PetalLengthCm'].fillna(mean_PL)
data['PetalWidthCm'] = data['PetalWidthCm'].fillna(mean_PW)

In [145]:
data

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.500000,1.400000,1.195918,Iris-setosa
1,2,4.9,3.000000,1.400000,1.195918,Iris-setosa
2,2,4.9,3.000000,1.400000,1.195918,Iris-setosa
3,3,4.7,3.200000,3.746053,0.200000,Iris-setosa
4,4,4.6,3.100000,1.500000,0.200000,Iris-setosa
...,...,...,...,...,...,...
149,146,6.7,3.000000,5.200000,2.300000,Iris-virginica
150,147,6.3,3.043333,5.000000,1.900000,Iris-virginica
151,148,6.5,3.000000,5.200000,2.000000,Iris-virginica
152,149,6.2,3.400000,5.400000,2.300000,Iris-virginica


In [146]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 154 entries, 0 to 153
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             154 non-null    int64  
 1   SepalLengthCm  154 non-null    float64
 2   SepalWidthCm   154 non-null    float64
 3   PetalLengthCm  154 non-null    float64
 4   PetalWidthCm   154 non-null    float64
 5   Species        154 non-null    object 
dtypes: float64(4), int64(1), object(1)
memory usage: 7.3+ KB


### Exercise

#### Use the Median and Mode values of columns to replace missing values in their respective columns

In [None]:
# mean(), mode(), median()

In [147]:
# Remove Duplicates 

# duplicated 
data
# drop_duplicates


Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.500000,1.400000,1.195918,Iris-setosa
1,2,4.9,3.000000,1.400000,1.195918,Iris-setosa
2,2,4.9,3.000000,1.400000,1.195918,Iris-setosa
3,3,4.7,3.200000,3.746053,0.200000,Iris-setosa
4,4,4.6,3.100000,1.500000,0.200000,Iris-setosa
...,...,...,...,...,...,...
149,146,6.7,3.000000,5.200000,2.300000,Iris-virginica
150,147,6.3,3.043333,5.000000,1.900000,Iris-virginica
151,148,6.5,3.000000,5.200000,2.000000,Iris-virginica
152,149,6.2,3.400000,5.400000,2.300000,Iris-virginica


In [152]:
data.duplicated()

0      False
1      False
2       True
3      False
4      False
       ...  
149    False
150    False
151    False
152    False
153    False
Length: 154, dtype: bool

In [170]:
data = data.drop_duplicates()

## Pandas - Basic Data Analysis


Data analysis simple means getting an insight of data. It invloves using tool(s) such as Python, R, Excel, SQL and Libraries such as Pandas and Numpy to understand data. 

In this section, we will look at the following techniques in Data Analysis:

- Filtering
- Sorting
- Data Correlation



In [174]:
# Filtering
data

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.500000,1.400000,1.195918,Iris-setosa
1,2,4.9,3.000000,1.400000,1.195918,Iris-setosa
3,3,4.7,3.200000,3.746053,0.200000,Iris-setosa
4,4,4.6,3.100000,1.500000,0.200000,Iris-setosa
5,5,5.0,3.600000,1.400000,0.200000,Iris-setosa
...,...,...,...,...,...,...
149,146,6.7,3.000000,5.200000,2.300000,Iris-virginica
150,147,6.3,3.043333,5.000000,1.900000,Iris-virginica
151,148,6.5,3.000000,5.200000,2.000000,Iris-virginica
152,149,6.2,3.400000,5.400000,2.300000,Iris-virginica


In [176]:
data.filter(items=['SepalLengthCm', 'SepalWidthCm'])

Unnamed: 0,SepalLengthCm,SepalWidthCm
0,5.1,3.500000
1,4.9,3.000000
3,4.7,3.200000
4,4.6,3.100000
5,5.0,3.600000
...,...,...
149,6.7,3.000000
150,6.3,3.043333
151,6.5,3.000000
152,6.2,3.400000


In [181]:
data.filter(like='Leng')

Unnamed: 0,SepalLengthCm,PetalLengthCm
0,5.1,1.400000
1,4.9,1.400000
3,4.7,3.746053
4,4.6,1.500000
5,5.0,1.400000
...,...,...
149,6.7,5.200000
150,6.3,5.000000
151,6.5,5.200000
152,6.2,5.400000


In [182]:
# Sorting
data

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.500000,1.400000,1.195918,Iris-setosa
1,2,4.9,3.000000,1.400000,1.195918,Iris-setosa
3,3,4.7,3.200000,3.746053,0.200000,Iris-setosa
4,4,4.6,3.100000,1.500000,0.200000,Iris-setosa
5,5,5.0,3.600000,1.400000,0.200000,Iris-setosa
...,...,...,...,...,...,...
149,146,6.7,3.000000,5.200000,2.300000,Iris-virginica
150,147,6.3,3.043333,5.000000,1.900000,Iris-virginica
151,148,6.5,3.000000,5.200000,2.000000,Iris-virginica
152,149,6.2,3.400000,5.400000,2.300000,Iris-virginica


In [185]:
data.sort_values('SepalLengthCm', ascending=False)

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
134,132,7.9,3.8,6.4,2.0,Iris-virginica
139,136,7.7,3.0,6.1,2.3,Iris-virginica
125,123,7.7,2.8,6.7,2.0,Iris-virginica
120,118,7.7,3.8,6.7,2.2,Iris-virginica
121,119,7.7,2.6,6.9,2.3,Iris-virginica
...,...,...,...,...,...,...
43,42,4.5,2.3,1.3,0.3,Iris-setosa
44,43,4.4,3.2,1.3,0.2,Iris-setosa
40,39,4.4,3.0,1.3,0.2,Iris-setosa
9,9,4.4,2.9,1.4,0.2,Iris-setosa


In [186]:
# Data Correlation
data

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.500000,1.400000,1.195918,Iris-setosa
1,2,4.9,3.000000,1.400000,1.195918,Iris-setosa
3,3,4.7,3.200000,3.746053,0.200000,Iris-setosa
4,4,4.6,3.100000,1.500000,0.200000,Iris-setosa
5,5,5.0,3.600000,1.400000,0.200000,Iris-setosa
...,...,...,...,...,...,...
149,146,6.7,3.000000,5.200000,2.300000,Iris-virginica
150,147,6.3,3.043333,5.000000,1.900000,Iris-virginica
151,148,6.5,3.000000,5.200000,2.000000,Iris-virginica
152,149,6.2,3.400000,5.400000,2.300000,Iris-virginica


In [187]:
data.corr()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm
Id,1.0,0.716372,-0.371036,0.865748,0.860672
SepalLengthCm,0.716372,1.0,-0.092577,0.865596,0.800449
SepalWidthCm,-0.371036,-0.092577,1.0,-0.400327,-0.329303
PetalLengthCm,0.865748,0.865596,-0.400327,1.0,0.928142
PetalWidthCm,0.860672,0.800449,-0.329303,0.928142,1.0
