## **PANDAS: INTRODUCTION**
> It is often said that 80% of data analysis is spent on the data cleaning and preparing data. To get a handle on the problem, this section focuses on a small, but important aspect of data manipulation and cleaning with Pandas.
### **Data Structures**
**There are two data structures are there in Pandas -**<br>
* **Series -** It is one-dimensional labeled array capable of holding any data type (integer, strings, floating point numbers, Python objects etc.) of data. The axis is collectively referred to as index.

* **Data Frame -** It is a two-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL Table or a Series of objects.

### **Series Data Structure:**
**pandas.core.series.Series(data, index, dtype, copy)**<br>
* **data -** data takes various forms like ndarray, list, constants, dictionary etc.<br>
* **index -** it is unique and hashable for easy identification.<br>
* **dtype -** it is for data type.<br>
* **copy -** copy data, and its default value is False. It only affects for Series on one dimensional ndarray inputs.

In [1]:
# importing required modules
import pandas as pd
import numpy as np

In [7]:
# creating empty Series
import warnings
warnings.filterwarnings('ignore')
s = pd.Series()
print (s, type(s))

Series([], dtype: float64) <class 'pandas.core.series.Series'>


In [4]:
# create a Series from a ndarray
arr_data = np.array(['apple', 'banana', 'cherry', 'pineapple'])
s = pd.Series(data = arr_data)
print (s, type(s), s[0], s[3])

0        apple
1       banana
2       cherry
3    pineapple
dtype: object <class 'pandas.core.series.Series'> apple pineapple


In [12]:
arr_data = np.array([100, 300, 200, 600, 500])
s = pd.Series(arr_data, copy = False)
s[0] = 999; arr_data[2] = 888
print (arr_data, type(arr_data), "\n", s, type(s))

arr_data = np.array([100, 300, 200, 600, 500])
s = pd.Series(arr_data, copy = True)
s[0] = 999; arr_data[2] = 888
print (arr_data, type(arr_data), "\n", s, type(s))

[999 300 888 600 500] <class 'numpy.ndarray'> 
 0    999
1    300
2    888
3    600
4    500
dtype: int32 <class 'pandas.core.series.Series'>
[100 300 888 600 500] <class 'numpy.ndarray'> 
 0    999
1    300
2    200
3    600
4    500
dtype: int32 <class 'pandas.core.series.Series'>


In [15]:
arr_data = np.array(['apple', 'banana', 'cherry', 'pineapple'])
print (arr_data, type(arr_data))

s = pd.Series(data = arr_data, index = [100, 101, 102, 103])
print (s)
print (s[100], type(s[100]), s[103], type(s[103]))

['apple' 'banana' 'cherry' 'pineapple'] <class 'numpy.ndarray'>
100        apple
101       banana
102       cherry
103    pineapple
dtype: object
apple <class 'str'> pineapple <class 'str'>


In [19]:
s = pd.Series(data = arr_data, index = [100, 101, 100, 103])
print (s)
print (s[100], type(s[100]), s[103], type(s[103]))

100        apple
101       banana
100       cherry
103    pineapple
dtype: object
100     apple
100    cherry
dtype: object <class 'pandas.core.series.Series'> pineapple <class 'str'>


In [17]:
arr_data = np.array(['apple', 'banana', 'cherry', 'pineapple'])
s = pd.Series(data = arr_data, index = ['fruit-1', 'fruit-2', 'fruit-3', 'fruit-4'])
print (s)
print (s['fruit-1'], s[0], s['fruit-3'], s[2])

fruit-1        apple
fruit-2       banana
fruit-3       cherry
fruit-4    pineapple
dtype: object
apple apple cherry cherry


In [21]:
# create a Series from a dictionary
dict_data = {'apple':100, 'banana':202, 'coconut':450, 'mango':435}
s = pd.Series(dict_data)
print (s)
s = pd.Series(dict_data, index = ['banana', 'mango', 'apple', 'coconut'])
print (s)

apple      100
banana     202
coconut    450
mango      435
dtype: int64
banana     202
mango      435
apple      100
coconut    450
dtype: int64


In [24]:
dict_data = {'apple':100, 'banana':202, 'coconut':450, 'mango':435}
s = pd.Series(dict_data, index = ['banana', 'mango', 'apple', 'coconut'])
print (s)
s = pd.Series(data = dict_data, index = 
                    ['banana', 'lime', 'coconut', 'mango', 'guava', 'apple', 'mango', 'apple', 'coconut'])
print (s)
print (s['banana'], s['lime'], s[4], s[5])

banana     202
mango      435
apple      100
coconut    450
dtype: int64
banana     202.0
lime         NaN
coconut    450.0
mango      435.0
guava        NaN
apple      100.0
mango      435.0
apple      100.0
coconut    450.0
dtype: float64
202.0 nan nan 100.0


In [25]:
# create a Series from a scalar
s = pd.Series(5, index = [0, 1, 2, 3, 4])
print(s)
s = pd.Series(5, index = [0, 1, 2, 0, 1, 2])
print(s)

0    5
1    5
2    5
3    5
4    5
dtype: int64
0    5
1    5
2    5
0    5
1    5
2    5
dtype: int64


In [33]:
# Create a Series from a list
s = pd.Series(data = [101, 303, 202, 404, 505], index = ['red', 'blue', 'brown', 'black', 'silver'])
print (s)

red       101
blue      303
brown     202
black     404
silver    505
dtype: int64


In [33]:
# Create a Series from a list
s = pd.Series(data = [101, 303, 202, 404, 505], index = ['red', 'blue', 'brown', 'black', 'silver'])
print (s)

red       101
blue      303
brown     202
black     404
silver    505
dtype: int64


#### Data Frame Data Structure:

#### Create DataFrame

In [7]:
data_dict = {'emp_name':['Amal', 'Kamal', 'Bimal', 'Shyamal'], 'emp_age':[34, 35, 45, 43]}
df = pd.DataFrame(data_dict)
df

Unnamed: 0,emp_name,emp_age
0,Amal,34
1,Kamal,35
2,Bimal,45
3,Shyamal,43


In [8]:
data_dict = {'emp_name':['Amal', 'Kamal', 'Bimal', 'Shyamal'], 'emp_age':[34, 35, 45, 43]}
df = pd.DataFrame(data = data_dict)
df

Unnamed: 0,emp_name,emp_age
0,Amal,34
1,Kamal,35
2,Bimal,45
3,Shyamal,43


In [17]:
data_dict = {'emp_name':['Amal', 'Kamal', 'Bimal', 'Shyamal'], 'emp_age':[34, 35, 45, 43]}
emp_id = [100, 101, 102, 103]
df = pd.DataFrame(data = data_dict, index = emp_id)
df

Unnamed: 0,emp_name,emp_age
100,Amal,34
101,Kamal,35
102,Bimal,45
103,Shyamal,43


In [18]:
data_dict = {'emp_name':['Amal', 'Kamal', 'Bimal', 'Shyamal'], 'emp_age':[34, 35, 45, 43]}
emp_id = [100, 101, 102, 103]
df = pd.DataFrame(data = data_dict, index = emp_id)
print (df)
df = df.reset_index()
df

    emp_name  emp_age
100     Amal       34
101    Kamal       35
102    Bimal       45
103  Shyamal       43


Unnamed: 0,index,emp_name,emp_age
0,100,Amal,34
1,101,Kamal,35
2,102,Bimal,45
3,103,Shyamal,43


In [19]:
user_data = [['alice', 19, 'F', 'student'], ['john', 26, 'M', 'student']]  # list of lists
user1 = pd.DataFrame(user_data)
user1

Unnamed: 0,0,1,2,3
0,alice,19,F,student
1,john,26,M,student


In [18]:
user_data = [['alice', 19, 'F', 'student'], ['john', 26, 'M', 'student']]  # list of lists
user_columns = ['name', 'age', 'gender', 'job']
user1 = pd.DataFrame(data = user_data, columns = user_columns)
user1

Unnamed: 0,name,age,gender,job
0,alice,19,F,student
1,john,26,M,student


In [19]:
user_data = dict(name = ['eric', 'paul'], age = [22, 58], gender = ['M', 'F'], job = ['student', 'manager'])
print (user_data)
user2 = pd.DataFrame(data = user_data)
user2

{'name': ['eric', 'paul'], 'age': [22, 58], 'gender': ['M', 'F'], 'job': ['student', 'manager']}


Unnamed: 0,name,age,gender,job
0,eric,22,M,student
1,paul,58,F,manager


In [4]:
user_data = {'name': ['peter', 'julie'], 'age': [33, 44], 'gender': ['M', 'F'], 'job': ['engineer', 'scientist']}
user3 = pd.DataFrame(data = user_data)
user3

Unnamed: 0,name,age,gender,job
0,peter,33,M,engineer
1,julie,44,F,scientist


#### Concatenate DataFrame

In [20]:
print (user1)
print ()
print (user2)
print ()
print (user3)

    name  age gender      job
0  alice   19      F  student
1   john   26      M  student

   name  age gender      job
0  eric   22      M  student
1  paul   58      F  manager

    name  age gender        job
0  peter   33      M   engineer
1  julie   44      F  scientist


In [8]:
users = user1.append(user2)
users

Unnamed: 0,name,age,gender,job
0,alice,19,F,student
1,john,26,M,student
0,eric,22,M,student
1,paul,58,F,manager


In [9]:
users = user1.append(user2, ignore_index = True)
users

Unnamed: 0,name,age,gender,job
0,alice,19,F,student
1,john,26,M,student
2,eric,22,M,student
3,paul,58,F,manager


In [10]:
users = users.append(user3, ignore_index = True)
users

Unnamed: 0,name,age,gender,job
0,alice,19,F,student
1,john,26,M,student
2,eric,22,M,student
3,paul,58,F,manager
4,peter,33,M,engineer
5,julie,44,F,scientist


In [11]:
users = user1.append(user2).append(user3, ignore_index = True)
users

Unnamed: 0,name,age,gender,job
0,alice,19,F,student
1,john,26,M,student
2,eric,22,M,student
3,paul,58,F,manager
4,peter,33,M,engineer
5,julie,44,F,scientist


In [21]:
users = pd.concat([user1, user2, user3])
users

Unnamed: 0,name,age,gender,job
0,alice,19,F,student
1,john,26,M,student
0,eric,22,M,student
1,paul,58,F,manager
0,peter,33,M,engineer
1,julie,44,F,scientist


In [24]:
users = pd.concat([user1, user2, user3], ignore_index = True)
users

Unnamed: 0,name,age,gender,job
0,alice,19,F,student
1,john,26,M,student
2,eric,22,M,student
3,paul,58,F,manager
4,peter,33,M,engineer
5,julie,44,F,scientist
