## Introduction to Pandas: Data Manipulation

> It is often said that 80% of data analysis is spent on the data cleaning and preparing data. To get a handle on the problem, this section will focus on a small but important aspect of data manipulation and cleaning with Pandas.

### Data Structures in Pandas

**There are two different data Structures are there in Pandas -**
* **Series -** It is an one-dimensional labeled array capable of holding any data type data (e.g. integer, list, dictionary, floating point, string, Python objects etc.). The axis are collectively referred to as the index.
* **Data Frame -** It is a two-domensional labeled data structure with columns of potentially different types. You can consider a Data Frame as one SQL Table of MS-Excel Spreadsheet.

### Series Data Structure

**pandas.code,series,Series(data, index, dtype, copy)**
* **data -** data may take various formats
* **index -** it is unique and hashable for easy data access and identification
* **dtype -** it is for data type
* **copy -** copy data, default is False (only affects Series on ndarray data sources)

In [1]:
# importing required modules
import pandas as pd
import numpy as np

In [6]:
# creating a Series from a ndarray
my_data = np.array(['Amit', 'Kamal', "Hari", "Imtiaz"])
print (my_data, type(my_data))
s = pd.Series(data = my_data)
print (s)
print (type(s))
print (s[0], s[3])

['Amit' 'Kamal' 'Hari' 'Imtiaz'] <class 'numpy.ndarray'>
0      Amit
1     Kamal
2      Hari
3    Imtiaz
dtype: object
<class 'pandas.core.series.Series'>
Amit Imtiaz


In [8]:
# creating a Series from a ndarray
my_data = np.array(['Amit', 'Kamal', "Hari", "Imtiaz"])
print (my_data, type(my_data))
s = pd.Series(data = my_data, index = ['Kol', 'Mum', 'Che', 'Del'])
print (s)
print (type(s))
print (s['Mum'], s['Del'])
print (s[0], s[3])

['Amit' 'Kamal' 'Hari' 'Imtiaz'] <class 'numpy.ndarray'>
Kol      Amit
Mum     Kamal
Che      Hari
Del    Imtiaz
dtype: object
<class 'pandas.core.series.Series'>
Kamal Imtiaz
Amit Imtiaz


In [9]:
# creating a Series from a ndarray
my_data = np.array(['Amit', 'Kamal', "Hari", "Imtiaz"])
print (my_data, type(my_data))
s = pd.Series(my_data, index = ['Kol', 'Mum', 'Che', 'Del'])
print (s)
print (type(s))
print (s['Mum'], s['Del'])
print (s[0], s[3])

['Amit' 'Kamal' 'Hari' 'Imtiaz'] <class 'numpy.ndarray'>
Kol      Amit
Mum     Kamal
Che      Hari
Del    Imtiaz
dtype: object
<class 'pandas.core.series.Series'>
Kamal Imtiaz
Amit Imtiaz


In [15]:
# creating a Series from a ndarray
my_data = np.array(['Amit', 'Kamal', "Hari", "Imtiaz", "Prasenjit"])
print (my_data, type(my_data))
s = pd.Series(data = my_data, index = ['Kol', 'Mum', 'Kol', 'Mum', 'Del'])
print (s)
print (type(s))
print (s['Mum'])
print (s['Kol'], type(s['Kol']))
print (s['Del'], type(s['Del']))
# print (s[0], s[3])

['Amit' 'Kamal' 'Hari' 'Imtiaz' 'Prasenjit'] <class 'numpy.ndarray'>
Kol         Amit
Mum        Kamal
Kol         Hari
Mum       Imtiaz
Del    Prasenjit
dtype: object
<class 'pandas.core.series.Series'>
Mum     Kamal
Mum    Imtiaz
dtype: object
Kol    Amit
Kol    Hari
dtype: object <class 'pandas.core.series.Series'>
Prasenjit <class 'str'>


In [18]:
# creating a Series from a ndarray
my_data = np.array(['Amit', 'Kamal', "Hari", "Imtiaz", "Prasenjit"])
print (my_data, type(my_data))
s = pd.Series(data = my_data, index = [101, 102, 103, 104, 105])
print (s)
print (type(s))
print (s[101])
print (s[103])

['Amit' 'Kamal' 'Hari' 'Imtiaz' 'Prasenjit'] <class 'numpy.ndarray'>
101         Amit
102        Kamal
103         Hari
104       Imtiaz
105    Prasenjit
dtype: object
<class 'pandas.core.series.Series'>
Amit
Hari


In [19]:
# creating a Series from a ndarray
my_data = np.array(['Amit', 'Kamal', "Hari", "Imtiaz", "Prasenjit"])
print (my_data, type(my_data))
s = pd.Series(data = my_data, index = [101, 102, 103, 101, 105])
print (s)
print (type(s))
print (s[101])
print (s[103])

['Amit' 'Kamal' 'Hari' 'Imtiaz' 'Prasenjit'] <class 'numpy.ndarray'>
101         Amit
102        Kamal
103         Hari
101       Imtiaz
105    Prasenjit
dtype: object
<class 'pandas.core.series.Series'>
101      Amit
101    Imtiaz
dtype: object
Hari


In [24]:
arr1 = np.array([100, 200, 300, 400, 500])
print (arr1, type(arr1))
s = pd.Series(data = arr1, copy = False)
print (s)
print (type(s))
s[0] = 111
arr1[1] = 222
print (arr1)
print (s)

[100 200 300 400 500] <class 'numpy.ndarray'>
0    100
1    200
2    300
3    400
4    500
dtype: int32
<class 'pandas.core.series.Series'>
[111 222 300 400 500]
0    111
1    222
2    300
3    400
4    500
dtype: int32


In [25]:
arr1 = np.array([100, 200, 300, 400, 500])
print (arr1, type(arr1))
s = pd.Series(data = arr1, copy = True)
print (s)
print (type(s))
s[0] = 111
arr1[1] = 222
print (arr1)
print (s)

[100 200 300 400 500] <class 'numpy.ndarray'>
0    100
1    200
2    300
3    400
4    500
dtype: int32
<class 'pandas.core.series.Series'>
[100 222 300 400 500]
0    111
1    200
2    300
3    400
4    500
dtype: int32
