###**Creating copy of DataFrame**

In [None]:
df = iris 
## Above statement simply makes df refer to the data frame object that iris is referring to. 
## So now both iris and df refer to the same dataframe object and any changes done via one will reflect in other.
## So effectively this is not creating another dataframe object.     

If we wish to create a copy then we will use **copy()** function for that

In [None]:
df = iris.copy()

In [None]:
df.shape

(149, 5)

As you can see, we have 149 rows and 5 columns. But actually, this should have been 150 rows, as we already know, the Iris Dataset has information of 3 different types of flower, 50 each. This happened because the first row was taken as the column name. To fix this, we do the following:

In [None]:
#Ignoring header -> If you don't want first row to be treated as a header, you can set header = None
iris = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data", header=None)
iris

Unnamed: 0,0,1,2,3,4
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,Iris-virginica
146,6.3,2.5,5.0,1.9,Iris-virginica
147,6.5,3.0,5.2,2.0,Iris-virginica
148,6.2,3.4,5.4,2.3,Iris-virginica


In [None]:
df = iris.copy()
df.shape

(150, 5)

To see the datatypes of each column we do the following:

In [None]:
df.dtypes

0    float64
1    float64
2    float64
3    float64
4     object
dtype: object

Currently, our columns have no names.

In [None]:
df.columns

Int64Index([0, 1, 2, 3, 4], dtype='int64')

To give them a name, we simply change the value of df.columns

In [None]:
df.columns = ['sl', 'sw', 'pl', 'pw', 'flower_type']
df

Unnamed: 0,sl,sw,pl,pw,flower_type
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,Iris-virginica
146,6.3,2.5,5.0,1.9,Iris-virginica
147,6.5,3.0,5.2,2.0,Iris-virginica
148,6.2,3.4,5.4,2.3,Iris-virginica


In [None]:
df.dtypes

sl             float64
sw             float64
pl             float64
pw             float64
flower_type     object
dtype: object

We may get a quick analysis of our data using **describe()**

In [None]:
df.describe()

Unnamed: 0,sl,sw,pl,pw
count,150.0,150.0,150.0,150.0
mean,5.843333,3.054,3.758667,1.198667
std,0.828066,0.433594,1.76442,0.763161
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.35,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5


##**Some Basic Functionalties**

###**Viewing the DataFrame**

We have the **head()** and **tail()** function for viewing the dataframe.


####**head()**



This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it.

By default, value of n = 5.

In [None]:
df.head()

Unnamed: 0,sl,sw,pl,pw,flower_type
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [None]:
df.head(10)

Unnamed: 0,sl,sw,pl,pw,flower_type
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
5,5.4,3.9,1.7,0.4,Iris-setosa
6,4.6,3.4,1.4,0.3,Iris-setosa
7,5.0,3.4,1.5,0.2,Iris-setosa
8,4.4,2.9,1.4,0.2,Iris-setosa
9,4.9,3.1,1.5,0.1,Iris-setosa


####**tail()**

This function returns the last n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it.

By default, value of n = 5.

In [None]:
df.tail()

Unnamed: 0,sl,sw,pl,pw,flower_type
145,6.7,3.0,5.2,2.3,Iris-virginica
146,6.3,2.5,5.0,1.9,Iris-virginica
147,6.5,3.0,5.2,2.0,Iris-virginica
148,6.2,3.4,5.4,2.3,Iris-virginica
149,5.9,3.0,5.1,1.8,Iris-virginica


In [None]:
df.tail(11)

Unnamed: 0,sl,sw,pl,pw,flower_type
139,6.9,3.1,5.4,2.1,Iris-virginica
140,6.7,3.1,5.6,2.4,Iris-virginica
141,6.9,3.1,5.1,2.3,Iris-virginica
142,5.8,2.7,5.1,1.9,Iris-virginica
143,6.8,3.2,5.9,2.3,Iris-virginica
144,6.7,3.3,5.7,2.5,Iris-virginica
145,6.7,3.0,5.2,2.3,Iris-virginica
146,6.3,2.5,5.0,1.9,Iris-virginica
147,6.5,3.0,5.2,2.0,Iris-virginica
148,6.2,3.4,5.4,2.3,Iris-virginica


###**Accessing Data**

Sometimes, we may want to look at a single column from the DataFrame. This can be done simply as:

In [None]:
## Viewing sl column
df.sl

0      5.1
1      4.9
2      4.7
3      4.6
4      5.0
      ... 
145    6.7
146    6.3
147    6.5
148    6.2
149    5.9
Name: sl, Length: 150, dtype: float64

**and**

In [None]:
df['sl']

0      5.1
1      4.9
2      4.7
3      4.6
4      5.0
      ... 
145    6.7
146    6.3
147    6.5
148    6.2
149    5.9
Name: sl, Length: 150, dtype: float64

###**Checking for NULL values**

In [None]:
df.isnull()

Unnamed: 0,sl,sw,pl,pw,flower_type
0,False,False,False,False,False
1,False,False,False,False,False
2,False,False,False,False,False
3,False,False,False,False,False
4,False,False,False,False,False
...,...,...,...,...,...
145,False,False,False,False,False
146,False,False,False,False,False
147,False,False,False,False,False
148,False,False,False,False,False


In [None]:
# To get a direct overview 
df.isnull().sum()

sl             0
sw             0
pl             0
pw             0
flower_type    0
dtype: int64

###**Selection**

####**iloc[]**

We can use the **iloc[ ]** function to access values in dataframe.

It is a purely integer-location based indexing for selection by position. 
iloc[] is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array.

Allowed inputs are:
1. An integer, e.g. 5.
2. A list or array of integers, e.g. [4, 3, 0].
3. A slice object with ints, e.g. 1:7.
4. A boolean array.

In [None]:
df.iloc[1:4, 2:4]

Unnamed: 0,pl,pw
1,1.4,0.2
2,1.3,0.2
3,1.5,0.2


####**loc[ ]**

This accesses a group of rows and columns by label(s) or a boolean array.

**.loc[ ]** is primarily label based, but may also be used with a boolean array.

Allowed inputs are:
1. A single label, e.g. 5 or 'a', (note that 5 is interpreted as a label of the index, and never as an integer position along the index).
2. A list or array of labels, e.g. ['a', 'b', 'c'].
3. A slice object with labels, e.g. 'a':'f'.
4. A boolean array of the same length as the axis being sliced, e.g. [True, False, True].

In [None]:
df1 = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
     index=['cobra', 'viper', 'sidewinder'],
     columns=['max_speed', 'shield'])
df1

Unnamed: 0,max_speed,shield
cobra,1,2
viper,4,5
sidewinder,7,8


In [None]:
df1.loc['viper']

max_speed    4
shield       5
Name: viper, dtype: int64

In [None]:
df1.loc[['viper', 'sidewinder']]

Unnamed: 0,max_speed,shield
viper,4,5
sidewinder,7,8


###**DataFrame from Dictionary**

In [None]:
mydict = [{'a': 1, 'b': 2, 'c': 3, 'd': 4},
          {'a': 100, 'b': 200, 'c': 300, 'd': 400},
          {'a': 1000, 'b': 2000, 'c': 3000, 'd': 4000 }]
df1 = pd.DataFrame(mydict)
df1

Unnamed: 0,a,b,c,d
0,1,2,3,4
1,100,200,300,400
2,1000,2000,3000,4000


##**Manipulating data**