## Introduction to Pandas

The name Pandas is derived from the word Panel Data – an Econometrics from Multidimensional data.

pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool,
built on top of the Python programming language

* Built on top of NumPy and is a part of the SciPy ecosystem (Scientific Computing Tools for Python)

* Used in StatsModel, sklearn-pandas, Plotly, IPython, Jupyter, Spyder




## Key Features of Pandas
> * Fast and efficient DataFrame object with default and customized indexing.
> * Tools for loading data into in-memory data objects from different file formats.
> * Data alignment and integrated handling of missing data.
> * Reshaping and pivoting of date sets.
> * Label-based slicing, indexing and subsetting of large data sets.
> * Columns from a data structure can be deleted or inserted.
> * Group by data for aggregation and transformations.
> * High performance merging and joining of data.
> * Time Series functionality.

In [88]:
import numpy as np
import pandas as pd
# if you want to check the version!!!
pd.__version__

'1.3.5'

# Core components of pandas: Series and DataFrames

A **DataFrame** is a 2D data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns. It is similar to a spreadsheet, a SQL table or the data.frame in R.

Each column in a DataFrame is a **Series** (a one-dimensional array of values with an index) as shown in below example.


## Creating a dataframe using python dictionary

In [89]:
# declare a python dictionary
data = {
    'employee_name' : ['Sam','Max','Tony','Sarah', 'Tania'],
    'employee_dept' : ['Research', 'HR', 'Marketing', 'Sales', 'Finance']
}

In [90]:
data

{'employee_dept': ['Research', 'HR', 'Marketing', 'Sales', 'Finance'],
 'employee_name': ['Sam', 'Max', 'Tony', 'Sarah', 'Tania']}

In [91]:
employee_records = pd.DataFrame(data)

In [92]:
employee_records

Unnamed: 0,employee_name,employee_dept
0,Sam,Research
1,Max,HR
2,Tony,Marketing
3,Sarah,Sales
4,Tania,Finance


In [93]:
# assign index
employee_records = pd.DataFrame(data, index = [4434, 3243, 3434, 4563, 4489])
employee_records

Unnamed: 0,employee_name,employee_dept
4434,Sam,Research
3243,Max,HR
3434,Tony,Marketing
4563,Sarah,Sales
4489,Tania,Finance


In [94]:
employee_records.index.name = 'employee_id'
employee_records

Unnamed: 0_level_0,employee_name,employee_dept
employee_id,Unnamed: 1_level_1,Unnamed: 2_level_1
4434,Sam,Research
3243,Max,HR
3434,Tony,Marketing
4563,Sarah,Sales
4489,Tania,Finance


In [95]:
# accessing employee details using their employee id
employee_records.loc[3434]

employee_name         Tony
employee_dept    Marketing
Name: 3434, dtype: object

## Create a series from scratch

In [96]:
emp_id = pd.Series([4434,3243,3434,4563,4489], name='employee_id', index = [1,2,3,4,5])
emp_id

1    4434
2    3243
3    3434
4    4563
5    4489
Name: employee_id, dtype: int64

## Creating a DF by passing a NumPy array

In [97]:
# define three column names
columns = ['col_1','col_2','col_3']
# define a NumPy array of size 3x3
data = np.array([[1,3,7],[8,0,5],[4,7,2]])
# creating a pandas dataframe
sample_df = pd.DataFrame(data, columns=columns)
print(sample_df)

   col_1  col_2  col_3
0      1      3      7
1      8      0      5
2      4      7      2


In [98]:
data

array([[1, 3, 7],
       [8, 0, 5],
       [4, 7, 2]])

## creating a df using multiple series with same index

In [99]:
s1 = pd.Series([10,20,30,40,50], index=[1,2,3,4,5])
s2 = pd.Series([21,34,43,55,73], index=[1,2,3,4,5])
sample_df = pd.DataFrame({'col_1':s1,'col_2':s2})
sample_df

Unnamed: 0,col_1,col_2
1,10,21
2,20,34
3,30,43
4,40,55
5,50,73


In [100]:
s1 = pd.Series([10,20,30,40,50], index=[1,2,3,4,5])
s2 = pd.Series([21,34,43,55,73], index=[1,2,3,4,6])
sample_df = pd.DataFrame({'col_1':s1,'col_2':s2})
sample_df

Unnamed: 0,col_1,col_2
1,10.0,21.0
2,20.0,34.0
3,30.0,43.0
4,40.0,55.0
5,50.0,
6,,73.0


In [101]:
s1 = pd.Series([10,20,30,40,50], index=[1,2,3,4,5])
s2 = pd.Series([21,34,43,55,73], index=[2,4,3,1,5])
sample_df = pd.DataFrame({'col_1':s1,'col_2':s2})
sample_df

Unnamed: 0,col_1,col_2
1,10,55
2,20,21
3,30,43
4,40,34
5,50,73


## Getting data types

In [102]:
sample_df.dtypes

col_1    int64
col_2    int64
dtype: object

In [103]:
df1 = pd.DataFrame(
    {
        "A": np.random.rand(3),
        "B": 1,
        "C": "foo",
        "D": pd.Timestamp("20010102"),
        "E": pd.Series([1.0] * 3).astype("float32"),
        "F": False,
        "G": pd.Series([1] * 3, dtype="float64"),
    }
)

In [104]:
df1.dtypes

A           float64
B             int64
C            object
D    datetime64[ns]
E           float32
F              bool
G           float64
dtype: object

# Creating the Employee DataFrame






In [105]:
employee_records = pd.DataFrame({
        'employee_name': ['Sam', 'Max', 'Tony', 'Sarah', 'Tania', 'David', 
                         'Mark','Alice', 'Charles', 'Bob', 'Anna'],
        'employee_dept': ['Research','HR','Marketing','Sales', 'Finance', 'IT', 'HR', 'Marketing', 'IT', 'Finance','Sales'],
        'employee_id' : [10001, 10002, 10003, 10004, 10005, 10006, 10007, 10008, 10009, 10010, 10011],
        'salary'     : [45034.88, 65343.45, 53423.27, 76422.34, 58753.00, 34323.44, 66544.60, 34354.66, 55234.96, 39078.60, 44567.88]
    })
employee_records

Unnamed: 0,employee_name,employee_dept,employee_id,salary
0,Sam,Research,10001,45034.88
1,Max,HR,10002,65343.45
2,Tony,Marketing,10003,53423.27
3,Sarah,Sales,10004,76422.34
4,Tania,Finance,10005,58753.0
5,David,IT,10006,34323.44
6,Mark,HR,10007,66544.6
7,Alice,Marketing,10008,34354.66
8,Charles,IT,10009,55234.96
9,Bob,Finance,10010,39078.6


In [106]:
# print top 5 rows
# df
employee_records.head()

Unnamed: 0,employee_name,employee_dept,employee_id,salary
0,Sam,Research,10001,45034.88
1,Max,HR,10002,65343.45
2,Tony,Marketing,10003,53423.27
3,Sarah,Sales,10004,76422.34
4,Tania,Finance,10005,58753.0


In [107]:
# print bottom 5 rows
# df
employee_records.tail()

Unnamed: 0,employee_name,employee_dept,employee_id,salary
6,Mark,HR,10007,66544.6
7,Alice,Marketing,10008,34354.66
8,Charles,IT,10009,55234.96
9,Bob,Finance,10010,39078.6
10,Anna,Sales,10011,44567.88


In [108]:
# displaying index values
employee_records.index

RangeIndex(start=0, stop=11, step=1)

In [109]:
# displaying index values
employee_records.index.values

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [110]:
# displaying column names
employee_records.columns


Index(['employee_name', 'employee_dept', 'employee_id', 'salary'], dtype='object')

In [111]:
# alternatively
employee_records.keys()

Index(['employee_name', 'employee_dept', 'employee_id', 'salary'], dtype='object')

### NumPy

In [112]:
# numpy representation of the underlying data
employee_records.to_numpy()

array([['Sam', 'Research', 10001, 45034.88],
       ['Max', 'HR', 10002, 65343.45],
       ['Tony', 'Marketing', 10003, 53423.27],
       ['Sarah', 'Sales', 10004, 76422.34],
       ['Tania', 'Finance', 10005, 58753.0],
       ['David', 'IT', 10006, 34323.44],
       ['Mark', 'HR', 10007, 66544.6],
       ['Alice', 'Marketing', 10008, 34354.66],
       ['Charles', 'IT', 10009, 55234.96],
       ['Bob', 'Finance', 10010, 39078.6],
       ['Anna', 'Sales', 10011, 44567.88]], dtype=object)

In [113]:
employee_records

Unnamed: 0,employee_name,employee_dept,employee_id,salary
0,Sam,Research,10001,45034.88
1,Max,HR,10002,65343.45
2,Tony,Marketing,10003,53423.27
3,Sarah,Sales,10004,76422.34
4,Tania,Finance,10005,58753.0
5,David,IT,10006,34323.44
6,Mark,HR,10007,66544.6
7,Alice,Marketing,10008,34354.66
8,Charles,IT,10009,55234.96
9,Bob,Finance,10010,39078.6


# Generating statistics summary

In [114]:
employee_records.describe()

Unnamed: 0,employee_id,salary
count,11.0,11.0
mean,10006.0,52098.28
std,3.316625,13923.224633
min,10001.0,34323.44
25%,10003.5,41823.24
50%,10006.0,53423.27
75%,10008.5,62048.225
max,10011.0,76422.34


In [115]:
employee_records.describe(include='all')

Unnamed: 0,employee_name,employee_dept,employee_id,salary
count,11,11,11.0,11.0
unique,11,6,,
top,Sam,HR,,
freq,1,2,,
mean,,,10006.0,52098.28
std,,,3.316625,13923.224633
min,,,10001.0,34323.44
25%,,,10003.5,41823.24
50%,,,10006.0,53423.27
75%,,,10008.5,62048.225


### Transposing

In [116]:
employee_records.T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10
employee_name,Sam,Max,Tony,Sarah,Tania,David,Mark,Alice,Charles,Bob,Anna
employee_dept,Research,HR,Marketing,Sales,Finance,IT,HR,Marketing,IT,Finance,Sales
employee_id,10001,10002,10003,10004,10005,10006,10007,10008,10009,10010,10011
salary,45034.88,65343.45,53423.27,76422.34,58753.0,34323.44,66544.6,34354.66,55234.96,39078.6,44567.88


In [117]:
# Sorting by an axis
employee_records

Unnamed: 0,employee_name,employee_dept,employee_id,salary
0,Sam,Research,10001,45034.88
1,Max,HR,10002,65343.45
2,Tony,Marketing,10003,53423.27
3,Sarah,Sales,10004,76422.34
4,Tania,Finance,10005,58753.0
5,David,IT,10006,34323.44
6,Mark,HR,10007,66544.6
7,Alice,Marketing,10008,34354.66
8,Charles,IT,10009,55234.96
9,Bob,Finance,10010,39078.6


In [118]:
# sorting by values of a particular column
employee_records.sort_values('salary')

Unnamed: 0,employee_name,employee_dept,employee_id,salary
5,David,IT,10006,34323.44
7,Alice,Marketing,10008,34354.66
9,Bob,Finance,10010,39078.6
10,Anna,Sales,10011,44567.88
0,Sam,Research,10001,45034.88
2,Tony,Marketing,10003,53423.27
8,Charles,IT,10009,55234.96
4,Tania,Finance,10005,58753.0
1,Max,HR,10002,65343.45
6,Mark,HR,10007,66544.6


In [119]:
employee_records.sort_values('salary',ascending=False)

Unnamed: 0,employee_name,employee_dept,employee_id,salary
3,Sarah,Sales,10004,76422.34
6,Mark,HR,10007,66544.6
1,Max,HR,10002,65343.45
4,Tania,Finance,10005,58753.0
8,Charles,IT,10009,55234.96
2,Tony,Marketing,10003,53423.27
0,Sam,Research,10001,45034.88
10,Anna,Sales,10011,44567.88
9,Bob,Finance,10010,39078.6
7,Alice,Marketing,10008,34354.66


# Getting

In [120]:
employee_records['employee_dept']

0      Research
1            HR
2     Marketing
3         Sales
4       Finance
5            IT
6            HR
7     Marketing
8            IT
9       Finance
10        Sales
Name: employee_dept, dtype: object

In [121]:
s1 = employee_records['salary']
isinstance(s1,pd.Series)

True

In [122]:
isinstance(employee_records,pd.DataFrame)

True

In [123]:
isinstance(s1,pd.DataFrame)

False

In [124]:
type(s1)

pandas.core.series.Series

In [125]:
type(employee_records)

pandas.core.frame.DataFrame

## Slicing

In [126]:
# slicing rows
employee_records[0:5] # displayed row first till end-1

Unnamed: 0,employee_name,employee_dept,employee_id,salary
0,Sam,Research,10001,45034.88
1,Max,HR,10002,65343.45
2,Tony,Marketing,10003,53423.27
3,Sarah,Sales,10004,76422.34
4,Tania,Finance,10005,58753.0


In [127]:
employee_records

Unnamed: 0,employee_name,employee_dept,employee_id,salary
0,Sam,Research,10001,45034.88
1,Max,HR,10002,65343.45
2,Tony,Marketing,10003,53423.27
3,Sarah,Sales,10004,76422.34
4,Tania,Finance,10005,58753.0
5,David,IT,10006,34323.44
6,Mark,HR,10007,66544.6
7,Alice,Marketing,10008,34354.66
8,Charles,IT,10009,55234.96
9,Bob,Finance,10010,39078.6


# Selection
```
* loc
* iloc
```

In [128]:
employee_records.loc[2]

employee_name         Tony
employee_dept    Marketing
employee_id          10003
salary            53423.27
Name: 2, dtype: object

In [129]:
employee_records.iloc[2]

employee_name         Tony
employee_dept    Marketing
employee_id          10003
salary            53423.27
Name: 2, dtype: object

In [130]:
# accessing rows using loc
employee_records.loc[3:6]

Unnamed: 0,employee_name,employee_dept,employee_id,salary
3,Sarah,Sales,10004,76422.34
4,Tania,Finance,10005,58753.0
5,David,IT,10006,34323.44
6,Mark,HR,10007,66544.6


In [131]:
# slicing
employee_records[3:6]

Unnamed: 0,employee_name,employee_dept,employee_id,salary
3,Sarah,Sales,10004,76422.34
4,Tania,Finance,10005,58753.0
5,David,IT,10006,34323.44


In [132]:
employee_records.iloc[3:6]

Unnamed: 0,employee_name,employee_dept,employee_id,salary
3,Sarah,Sales,10004,76422.34
4,Tania,Finance,10005,58753.0
5,David,IT,10006,34323.44


In [133]:
# accessing particular index id values
employee_records.loc[[3,7]]

Unnamed: 0,employee_name,employee_dept,employee_id,salary
3,Sarah,Sales,10004,76422.34
7,Alice,Marketing,10008,34354.66


In [134]:
isinstance(employee_records.loc[[3,7]], pd.DataFrame)

True

In [135]:
employee_records.loc[3,'employee_name']

'Sarah'

In [136]:
employee_records.iloc[3,3]

76422.34

In [137]:
employee_records

Unnamed: 0,employee_name,employee_dept,employee_id,salary
0,Sam,Research,10001,45034.88
1,Max,HR,10002,65343.45
2,Tony,Marketing,10003,53423.27
3,Sarah,Sales,10004,76422.34
4,Tania,Finance,10005,58753.0
5,David,IT,10006,34323.44
6,Mark,HR,10007,66544.6
7,Alice,Marketing,10008,34354.66
8,Charles,IT,10009,55234.96
9,Bob,Finance,10010,39078.6


In [138]:
employee_records.loc[[False,False,False,True,False,False,True,False,False,False,False]]

Unnamed: 0,employee_name,employee_dept,employee_id,salary
3,Sarah,Sales,10004,76422.34
6,Mark,HR,10007,66544.6


# Conditional Selection

In [139]:
# select records with salary greater than 60000
employee_records.loc[employee_records['salary'] > 60000]

Unnamed: 0,employee_name,employee_dept,employee_id,salary
1,Max,HR,10002,65343.45
3,Sarah,Sales,10004,76422.34
6,Mark,HR,10007,66544.6


In [140]:
employee_records['salary'] > 60000

0     False
1      True
2     False
3      True
4     False
5     False
6      True
7     False
8     False
9     False
10    False
Name: salary, dtype: bool

In [141]:
employee_records.loc[employee_records['salary'] > 60000, ['employee_id']]

Unnamed: 0,employee_id
1,10002
3,10004
6,10007


In [142]:
employee_records.loc[employee_records['salary'] > 60000, ['employee_id','employee_dept','salary']]

Unnamed: 0,employee_id,employee_dept,salary
1,10002,HR,65343.45
3,10004,Sales,76422.34
6,10007,HR,66544.6


In [143]:
employee_records.employee_dept

0      Research
1            HR
2     Marketing
3         Sales
4       Finance
5            IT
6            HR
7     Marketing
8            IT
9       Finance
10        Sales
Name: employee_dept, dtype: object

In [144]:
employee_records['employee_dept']

0      Research
1            HR
2     Marketing
3         Sales
4       Finance
5            IT
6            HR
7     Marketing
8            IT
9       Finance
10        Sales
Name: employee_dept, dtype: object

In [145]:
employee_records.loc[employee_records['employee_dept'] == 'HR']

Unnamed: 0,employee_name,employee_dept,employee_id,salary
1,Max,HR,10002,65343.45
6,Mark,HR,10007,66544.6


### Selection with multiple conditions

In [146]:
employee_records.loc[(employee_records.employee_id >= 10004) & (employee_records.employee_dept == 'IT')]

Unnamed: 0,employee_name,employee_dept,employee_id,salary
5,David,IT,10006,34323.44
8,Charles,IT,10009,55234.96


In [147]:
employee_records.loc[(employee_records.salary >= 50000) & (employee_records.employee_dept == 'IT')]

Unnamed: 0,employee_name,employee_dept,employee_id,salary
8,Charles,IT,10009,55234.96


In [148]:
new_df = employee_records.loc[(employee_records.salary >= 50000) | (employee_records.employee_dept == 'IT')]

## Copying your dataframe

In [149]:
temp_df = employee_records.copy()
temp_df

Unnamed: 0,employee_name,employee_dept,employee_id,salary
0,Sam,Research,10001,45034.88
1,Max,HR,10002,65343.45
2,Tony,Marketing,10003,53423.27
3,Sarah,Sales,10004,76422.34
4,Tania,Finance,10005,58753.0
5,David,IT,10006,34323.44
6,Mark,HR,10007,66544.6
7,Alice,Marketing,10008,34354.66
8,Charles,IT,10009,55234.96
9,Bob,Finance,10010,39078.6


In [150]:
# set a value to a particular row
temp_df.loc[2] = 50000

In [151]:
temp_df

Unnamed: 0,employee_name,employee_dept,employee_id,salary
0,Sam,Research,10001,45034.88
1,Max,HR,10002,65343.45
2,50000,50000,50000,50000.0
3,Sarah,Sales,10004,76422.34
4,Tania,Finance,10005,58753.0
5,David,IT,10006,34323.44
6,Mark,HR,10007,66544.6
7,Alice,Marketing,10008,34354.66
8,Charles,IT,10009,55234.96
9,Bob,Finance,10010,39078.6


In [152]:
temp_df2 = employee_records.copy()
temp_df2.loc[[2,5],:] = 50000
temp_df2

Unnamed: 0,employee_name,employee_dept,employee_id,salary
0,Sam,Research,10001,45034.88
1,Max,HR,10002,65343.45
2,50000,50000,50000,50000.0
3,Sarah,Sales,10004,76422.34
4,Tania,Finance,10005,58753.0
5,50000,50000,50000,50000.0
6,Mark,HR,10007,66544.6
7,Alice,Marketing,10008,34354.66
8,Charles,IT,10009,55234.96
9,Bob,Finance,10010,39078.6


In [153]:
temp_df2.loc[2]

employee_name      50000
employee_dept      50000
employee_id        50000
salary           50000.0
Name: 2, dtype: object

In [154]:
temp_df2.iloc[3]

employee_name       Sarah
employee_dept       Sales
employee_id         10004
salary           76422.34
Name: 3, dtype: object

In [155]:
temp_df

Unnamed: 0,employee_name,employee_dept,employee_id,salary
0,Sam,Research,10001,45034.88
1,Max,HR,10002,65343.45
2,50000,50000,50000,50000.0
3,Sarah,Sales,10004,76422.34
4,Tania,Finance,10005,58753.0
5,David,IT,10006,34323.44
6,Mark,HR,10007,66544.6
7,Alice,Marketing,10008,34354.66
8,Charles,IT,10009,55234.96
9,Bob,Finance,10010,39078.6


In [156]:
temp_df.loc[:,'EmployeeID'] = 10000

In [157]:
temp_df

Unnamed: 0,employee_name,employee_dept,employee_id,salary,EmployeeID
0,Sam,Research,10001,45034.88,10000
1,Max,HR,10002,65343.45,10000
2,50000,50000,50000,50000.0,10000
3,Sarah,Sales,10004,76422.34,10000
4,Tania,Finance,10005,58753.0,10000
5,David,IT,10006,34323.44,10000
6,Mark,HR,10007,66544.6,10000
7,Alice,Marketing,10008,34354.66,10000
8,Charles,IT,10009,55234.96,10000
9,Bob,Finance,10010,39078.6,10000


In [158]:
temp_df.loc[:,'employee_id'] = 10000

In [159]:
temp_df

Unnamed: 0,employee_name,employee_dept,employee_id,salary,EmployeeID
0,Sam,Research,10000,45034.88,10000
1,Max,HR,10000,65343.45,10000
2,50000,50000,10000,50000.0,10000
3,Sarah,Sales,10000,76422.34,10000
4,Tania,Finance,10000,58753.0,10000
5,David,IT,10000,34323.44,10000
6,Mark,HR,10000,66544.6,10000
7,Alice,Marketing,10000,34354.66,10000
8,Charles,IT,10000,55234.96,10000
9,Bob,Finance,10000,39078.6,10000


In [160]:
# Select rows with indexes
employee_records.iloc[[5,2]]

Unnamed: 0,employee_name,employee_dept,employee_id,salary
5,David,IT,10006,34323.44
2,Tony,Marketing,10003,53423.27


In [161]:
# Select rows and columns with particular indexes
employee_records.iloc[[4,2],[1,3]]

Unnamed: 0,employee_dept,salary
4,Finance,58753.0
2,Marketing,53423.27


In [162]:
# Select a range of rows and columns
employee_records.iloc[0:4,1:4] # last-1

Unnamed: 0,employee_dept,employee_id,salary
0,Research,10001,45034.88
1,HR,10002,65343.45
2,Marketing,10003,53423.27
3,Sales,10004,76422.34


In [163]:
employee_records

Unnamed: 0,employee_name,employee_dept,employee_id,salary
0,Sam,Research,10001,45034.88
1,Max,HR,10002,65343.45
2,Tony,Marketing,10003,53423.27
3,Sarah,Sales,10004,76422.34
4,Tania,Finance,10005,58753.0
5,David,IT,10006,34323.44
6,Mark,HR,10007,66544.6
7,Alice,Marketing,10008,34354.66
8,Charles,IT,10009,55234.96
9,Bob,Finance,10010,39078.6


In [164]:
employee_records.iloc[0:4,:] # last-1

Unnamed: 0,employee_name,employee_dept,employee_id,salary
0,Sam,Research,10001,45034.88
1,Max,HR,10002,65343.45
2,Tony,Marketing,10003,53423.27
3,Sarah,Sales,10004,76422.34


### Iterating through different items

In [165]:
for indexx,rows in employee_records.iterrows():
  print(indexx,rows,'\n')

0 employee_name         Sam
employee_dept    Research
employee_id         10001
salary           45034.88
Name: 0, dtype: object 

1 employee_name         Max
employee_dept          HR
employee_id         10002
salary           65343.45
Name: 1, dtype: object 

2 employee_name         Tony
employee_dept    Marketing
employee_id          10003
salary            53423.27
Name: 2, dtype: object 

3 employee_name       Sarah
employee_dept       Sales
employee_id         10004
salary           76422.34
Name: 3, dtype: object 

4 employee_name      Tania
employee_dept    Finance
employee_id        10005
salary           58753.0
Name: 4, dtype: object 

5 employee_name       David
employee_dept          IT
employee_id         10006
salary           34323.44
Name: 5, dtype: object 

6 employee_name       Mark
employee_dept         HR
employee_id        10007
salary           66544.6
Name: 6, dtype: object 

7 employee_name        Alice
employee_dept    Marketing
employee_id          10008
sala

In [166]:
# you can do that for a single column also
for index,row in employee_records.iterrows():
  print(index,row['salary'],'\n')

0 45034.88 

1 65343.45 

2 53423.27 

3 76422.34 

4 58753.0 

5 34323.44 

6 66544.6 

7 34354.66 

8 55234.96 

9 39078.6 

10 44567.88 



In [167]:
all_vals = []
all_values = []
for index,row in employee_records.iterrows():
  vals, valu = index,row['salary']
  all_vals.append(vals)
  all_values.append(valu)
all_values

[45034.88,
 65343.45,
 53423.27,
 76422.34,
 58753.0,
 34323.44,
 66544.6,
 34354.66,
 55234.96,
 39078.6,
 44567.88]

## Selecting data containing specific string

In [168]:
employee_records

Unnamed: 0,employee_name,employee_dept,employee_id,salary
0,Sam,Research,10001,45034.88
1,Max,HR,10002,65343.45
2,Tony,Marketing,10003,53423.27
3,Sarah,Sales,10004,76422.34
4,Tania,Finance,10005,58753.0
5,David,IT,10006,34323.44
6,Mark,HR,10007,66544.6
7,Alice,Marketing,10008,34354.66
8,Charles,IT,10009,55234.96
9,Bob,Finance,10010,39078.6


In [169]:
# to select employees whose name contains the letter i
employee_records.loc[employee_records['employee_name'].str.contains('i')] # case sensitive

Unnamed: 0,employee_name,employee_dept,employee_id,salary
4,Tania,Finance,10005,58753.0
5,David,IT,10006,34323.44
7,Alice,Marketing,10008,34354.66


In [170]:
# to select employees whose name contains the letter i
employee_records.loc[employee_records['employee_dept'].str.contains('i')] # case sensitive

Unnamed: 0,employee_name,employee_dept,employee_id,salary
2,Tony,Marketing,10003,53423.27
4,Tania,Finance,10005,58753.0
7,Alice,Marketing,10008,34354.66
9,Bob,Finance,10010,39078.6


In [171]:
employee_records.loc[employee_records['employee_dept'].str.contains('I')] # case sensitive

Unnamed: 0,employee_name,employee_dept,employee_id,salary
5,David,IT,10006,34323.44
8,Charles,IT,10009,55234.96


In [172]:
import re
employee_records.loc[employee_records['employee_dept'].str.contains('i|s', regex=True)] # case sensitive

Unnamed: 0,employee_name,employee_dept,employee_id,salary
0,Sam,Research,10001,45034.88
2,Tony,Marketing,10003,53423.27
3,Sarah,Sales,10004,76422.34
4,Tania,Finance,10005,58753.0
7,Alice,Marketing,10008,34354.66
9,Bob,Finance,10010,39078.6
10,Anna,Sales,10011,44567.88


In [173]:
employee_records.loc[employee_records['employee_name'].str.contains('i|c', flags=re.I, regex=True)] 

Unnamed: 0,employee_name,employee_dept,employee_id,salary
4,Tania,Finance,10005,58753.0
5,David,IT,10006,34323.44
7,Alice,Marketing,10008,34354.66
8,Charles,IT,10009,55234.96
