# Pandas Library

- Pandas is a Python library focused on efficient data manipulation and analysis.
- It simplifies tasks related to data cleaning, transformation, and analysis through its DataFrame and Series data structures.
- Pandas finds extensive use in data science for managing structured data, conducting statistical operations, and handling diverse file formats.

Import libraries that are needed

In [1]:
import pandas as pd
import numpy as np

## Object Series

A Pandas Series is like a column in a table. It is a one-dimensional array holding data of any type.

In [3]:
data = [0.25, 0.50, 0.75, 1]

In [4]:
print(data)

[0.25, 0.5, 0.75, 1]


Change data into a series

In [6]:
data = pd.Series(data)

In [7]:
data

0    0.25
1    0.50
2    0.75
3    1.00
dtype: float64

Convert series into an array 

In [9]:
data.values

array([0.25, 0.5 , 0.75, 1.  ])

Showing Index

In [10]:
data.index

RangeIndex(start=0, stop=4, step=1)

How to select data

In [11]:
data[2]

0.75

Implicit indexing is the default index. We can define our index, known as explicit indexing. When defining an index, the number of indexes must match the number of data points.

In [13]:
data = pd.Series([0.25, 0.50, 0.75, 1], index=['a', 'b', 'c', 'd'])

In [14]:
data

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

In [15]:
data.values

array([0.25, 0.5 , 0.75, 1.  ])

In [16]:
data.index

Index(['a', 'b', 'c', 'd'], dtype='object')

Select data

In [17]:
# explicit index

data['a']

0.25

Even though we have created an explicit index, we can still access its implicit index.

In [18]:
# implicit index

data[3]

1.0

When both the implicit and explicit indexes are the same, accessing the data will return the data associated with the explicit index.

In [19]:
data_2 = pd.Series([0.25, 0.50, 0.75, 1], index=[2,5,3,7])

In [20]:
data_2[2]

0.25

Also, an error will occur if you try to access the data using an implicit index like this because the index `0` doesn't exist in the Series 'data_2'.

In [22]:
data_2[0]

KeyError: 0

Data Slicing

In [23]:
data = pd.Series([0.25, 0.5, 0.75, 1], index=['a', 'b', 'c', 'd'])

In [24]:
data

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

Eg. we want to call data from b to c

In [26]:
data['b':'c'] # explicit index

b    0.50
c    0.75
dtype: float64

But when we slice its implicit index, only the starting point will appear because the implicit index is represented as a range.

In [27]:
data[1:2] # implicit index

b    0.5
dtype: float64

data[1:2] # implicit index

#### Task 1

In [28]:
a = [9, 'World', 3.14, 7>6, 3+7j]

a = pd.Series(a)

In [29]:
a

0         9
1     World
2      3.14
3      True
4    (3+7j)
dtype: object

#### Task 2

In [30]:
a = [9, 'World', 3.14, 7>6, 3+7j]

a = pd.Series(a)

a.values

array([9, 'World', 3.14, True, (3+7j)], dtype=object)

In [31]:
a.index

RangeIndex(start=0, stop=5, step=1)

In [32]:
a[1:4]

1    World
2     3.14
3     True
dtype: object

#### ----- Task 3 -------

In [34]:
a = pd.Series([9, 'World', 3.14, 7>6, 3+7j], index=['a', 'b', 'c', 'd', 'e'])

a

a         9
b     World
c      3.14
d      True
e    (3+7j)
dtype: object

In [35]:
a.index

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

In [36]:
a['b':'d']

b    World
c     3.14
d     True
dtype: object

In [37]:
a[1:3]

b    World
c     3.14
dtype: object

#### Task 4

In [38]:
b = pd.Series([9, 'World', 3.14, 7>6, 3, 1, 'Hello', 7.12, 1>3, 10+2j], 
              index=[5, 6, 7, 8, 9, 10, 11, 12, 13, 14])

b

5           9
6       World
7        3.14
8        True
9           3
10          1
11      Hello
12       7.12
13      False
14    (10+2j)
dtype: object

In [39]:
# explicit index
b[9]

3

In [40]:
# implisit index
b[2]

KeyError: 2

#### Task 5

In [41]:
b = pd.Series([9, 'World', 3.14, 7>6, 3, 1, 'Hello', 7.12, 1>3, 10+2j], 
              index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'])

b

a          9
b      World
c       3.14
d       True
e          3
f          1
g      Hello
h       7.12
i      False
j    (10+2j)
dtype: object

In [42]:
# explicit slicing
b['b':'i']

b    World
c     3.14
d     True
e        3
f        1
g    Hello
h     7.12
i    False
dtype: object

In [43]:
# implicit slicing using 2 parameter
parameter_2 = b[1:9]
print(parameter_2)

b    World
c     3.14
d     True
e        3
f        1
g    Hello
h     7.12
i    False
dtype: object


In [44]:
# implicit slicing using 3 parameter
b[1:9:3]

b    World
e        3
h     7.12
dtype: object

#### -----------------------------------------------------

In [45]:
data_2 = pd.Series([0.25, 0.50, 0.75, 1], index=[2,5,3,7])

In [46]:
data_2

2    0.25
5    0.50
3    0.75
7    1.00
dtype: float64

In [47]:
data_2[2:3] #explisit slicing

3    0.75
dtype: float64

In [48]:
data_2[2:3] #implisit slicing

3    0.75
dtype: float64

## LOC and ILOC

###  Loc (Location)
Loc is short for location. As the name implies, it is used to select data at a specific location only.

selecting index

In [50]:
data_2.loc[3]

0.75

In [51]:
#Slicing 2 parameter
data_2.loc[2:3]

2    0.25
5    0.50
3    0.75
dtype: float64

### ILOC
The iloc property gets, or sets, the value(s) of the specified indexes.

Specify both row and column with an index.

In [52]:
data_2.iloc[3]

1.0

In [53]:
#Slicing 2 parameter
data_2.iloc[2:3]

3    0.75
dtype: float64

#### ------- Task 1 ---------------------

In [55]:
data_1 = pd.Series([10, 11, 12, 13, 14], index=[1,2,3,4,5])

In [56]:
data_1

1    10
2    11
3    12
4    13
5    14
dtype: int64

In [57]:
data_1.loc[3]

12

In [58]:
data_1.iloc[3]

13

In [59]:
data_1[0]

KeyError: 0

In [60]:
data_1.iloc[0]

10

#### -----------------------------------------------

### Dictionary

In [61]:
dict_population = {'jakarta' : 750,
                'bogor' : 490,
                'depok' : 350,
                'tangerang' : 270,
                'bekasi' : 670}

In [62]:
dict_population

{'jakarta': 750, 'bogor': 490, 'depok': 350, 'tangerang': 270, 'bekasi': 670}

Convert to Series

In [63]:
population =  pd.Series(dict_populasi)

In [64]:
population

jakarta      750
bogor        490
depok        350
tangerang    270
bekasi       670
dtype: int64

In [65]:
population.loc['depok']

350

In [66]:
population.iloc[2]

350

#### --------------- Task 2 -------------------

In [77]:
exam_score = {'ana' : 80,
                'lia' : 87,
                'devi' : 93,
                'salsa' : 90,
                'kesya' : 89}

In [78]:
exam_score

{'ana': 80, 'lia': 87, 'devi': 93, 'salsa': 90, 'kesya': 89}

In [79]:
tscore = pd.Series(exam_score)

In [71]:
tscore

ana      80
lia      87
devi     93
salsa    90
kesya    89
dtype: int64

In [72]:
tscore.iloc[0]

80

In [73]:
tscore.loc['ana']

80

In [74]:
tscore.iloc[1]

87

In [75]:
tscore.loc['lia']

87

#### ------------------------------------------