# **Data Manipulation**

Data manipulation is a term that describes the process of changing data structures to make them easier to read. For example, data can be arranged alphabetically. So that the owner can immediately get useful information. Then for another example of data manipulation on a website.

# Import the required library

In [None]:
import pandas as pd
import numpy as np

# Object Series

A series is a 1-dimensional object that contains a sequence of values and is associated with a data label, called an index. To create a Series, form it from an array by calling the Series function in pandas.

In [None]:
angka=[1, 1.5, 2, 2.5]

# Converting data into series

In [None]:
angka=pd.Series(angka)

In [None]:
angka

0    1.0
1    1.5
2    2.0
3    2.5
dtype: float64

# Convert Series to Array

In [None]:
angka.values

array([1. , 1.5, 2. , 2.5])

Displays an **index**, the index is a range, where the start point is inclusive and the stop is exclusive

In [None]:
angka.index

RangeIndex(start=0, stop=4, step=1)

In [None]:
list(range(1,10))

[1, 2, 3, 4, 5, 6, 7, 8, 9]

# How to call data

In [None]:
angka #series

0    1.0
1    1.5
2    2.0
3    2.5
dtype: float64

In [None]:
angka[3]

2.5

**Implicit index** is when defining the index, the number of indexes must be equal to the number of data

In [None]:
angka=pd.Series([1, 1.5, 2, 2.5], index=['a', 'i', 'u', 'e'])

In [None]:
angka

a    1.0
i    1.5
u    2.0
e    2.5
dtype: float64

In [None]:
angka.index

Index(['a', 'i', 'u', 'e'], dtype='object')

# Check Data

In [None]:
#Explicit index

angka['i']

1.5

Even though i have called explicit index but still can call implicit index

In [None]:
#Implicit indeks

angka[2]

2.0

When the implicit and explicit indexes are the same, then after the call will rely on the explicit index

In [None]:
angka2=pd.Series([1, 1.5, 2, 2.5], index=[2,4,6,8])

In [None]:
angka2[6]

2.0

In [None]:
angka2[1]

KeyError: ignored

# Data Slicing

In [None]:
angka=pd.Series([1, 1.5, 2, 2.5], index=['a', 'i', 'u', 'e'])

In [None]:
angka

a    1.0
i    1.5
u    2.0
e    2.5
dtype: float64

In [None]:
angka['i':'e'] #explicit index

i    1.5
u    2.0
e    2.5
dtype: float64

# loc and iloc

In [None]:
angka2=pd.Series([1, 1.5, 2, 2.5], index=[2,4,6,8])

In [None]:
angka2

2    1.0
4    1.5
6    2.0
8    2.5
dtype: float64

In [None]:
angka2[8] #indeks explisit : selecting

2.5

In [None]:
angka2[2:6] #indeks explisit : slicing

6    2.0
8    2.5
dtype: float64

**1. loc**

The loc function is used to index data

In [None]:
# loc

angka2.loc[2]

1.0

In [None]:
angka2.loc[2:6]

2    1.0
4    1.5
6    2.0
dtype: float64

**2.iloc**

The iloc function is similar to loc but only for index integers. So when the index label is not an integer, an error will occur when calling

In [None]:
#iloc

angka2.iloc[2]

2.0

In [None]:
angka2.iloc[2:6]

6    2.0
8    2.5
dtype: float64

# Data Frame

A data frame is a collection of series, with at least one series.

In [None]:
#The population data below is taken from the following link https://id.wikipedia.org/wiki/Daftar_negara_menurut_jumlah_penduduk#Referensi

dict_populasi = {'Indonesia':272229372,
                 'Singapura':5612300,
                 'Thailand':69037513,
                 'Rusia':146877088,
                 'Tiongkok':1416180000}

In [None]:
dict_populasi

{'Indonesia': 272229372,
 'Rusia': 146877088,
 'Singapura': 5612300,
 'Thailand': 69037513,
 'Tiongkok': 1416180000}

In [None]:
#Transformation of dictionary to series

populasi = pd.Series(dict_populasi)

In [None]:
populasi

Indonesia     272229372
Singapura       5612300
Thailand       69037513
Rusia         146877088
Tiongkok     1416180000
dtype: int64

In [None]:
#Check the amount of data based on the desired location

populasi.loc['Rusia']

146877088

In [None]:
#the following way is also to check the data based on the index of the intended location

populasi.iloc[3]

146877088

In [None]:
#The following country-wide data is taken based on link https://id.wikipedia.org/wiki/Daftar_negara_menurut_luas_wilayah

dict_luas = {'Indonesia':1904569,
                 'Singapura':726	,
                 'Thailand':513120,
                 'Rusia':17098246,
                 'Tiongkok':9596961}

In [None]:
#Used to make changes in data calling

luas = pd.Series(dict_luas)

In [None]:
luas

Indonesia     1904569
Singapura         726
Thailand       513120
Rusia        17098246
Tiongkok      9596961
dtype: int64

In [None]:
#used to change populasi columns and also luas to become frame data so that it is easy to process and use

daerah = pd.DataFrame({'pop':populasi, 'Luas':luas})

In [None]:
daerah

Unnamed: 0,pop,Luas
Indonesia,272229372,1904569
Singapura,5612300,726
Thailand,69037513,513120
Rusia,146877088,17098246
Tiongkok,1416180000,9596961


In [None]:
#Order to look at the area of a country based on the desired country

daerah['Luas']['Rusia']

17098246

In [None]:
#Display all the data

daerah.pop

<bound method DataFrame.pop of                   pop      Luas
Indonesia   272229372   1904569
Singapura     5612300       726
Thailand     69037513    513120
Rusia       146877088  17098246
Tiongkok   1416180000   9596961>

In [None]:
#Command showing specific columns 

daerah['pop']

Indonesia     272229372
Singapura       5612300
Thailand       69037513
Rusia         146877088
Tiongkok     1416180000
Name: pop, dtype: int64

# Convert pop columns to population

In [None]:
#The following command is used to convert a previously column to a new name column

daerah = pd.DataFrame({'Population':populasi, 'Area':luas})

In [None]:
daerah

Unnamed: 0,Population,Area
Indonesia,272229372,1904569
Singapura,5612300,726
Thailand,69037513,513120
Rusia,146877088,17098246
Tiongkok,1416180000,9596961


In [None]:
#Command showing specific columns 

daerah['Population']

Indonesia     272229372
Singapura       5612300
Thailand       69037513
Rusia         146877088
Tiongkok     1416180000
Name: Population, dtype: int64

In [None]:
#Command showing specific data

daerah['Population']['Thailand':'Rusia'] #Eksplisit

Thailand     69037513
Rusia       146877088
Name: Population, dtype: int64

In [None]:
#The same command is used to display specific data based on the index

daerah['Population'].iloc[2:4] #Implisit

Thailand     69037513
Rusia       146877088
Name: Population, dtype: int64

#Add New Column

In [None]:
#The following command is used to add specific columns that are deemed necessary

daerah['pop_area']=daerah['Population']/daerah['Area']

In [None]:
daerah

Unnamed: 0,Population,Area,pop_area
Indonesia,272229372,1904569,142.934896
Singapura,5612300,726,7730.440771
Thailand,69037513,513120,134.544576
Rusia,146877088,17098246,8.590185
Tiongkok,1416180000,9596961,147.565464


# Add New Row

In [None]:
#Used to add new data to existing old data

daerah_new=pd.DataFrame({"India":[1416180000,3287263,430.808244]})

In [None]:
#Display attributes from new data

daerah_new

Unnamed: 0,India
0,1416180000.0
1,3287263.0
2,430.8082


In [None]:
#Transpose newly added data

daerah_new=daerah_new.T

In [None]:
daerah_new

Unnamed: 0,0,1,2
India,1416180000.0,3287263.0,430.808244


In [None]:
#Equalize between a new column name and an old column name

daerah_new.columns=daerah.columns

In [None]:
daerah_new

Unnamed: 0,Population,Area,pop_area
India,1416180000.0,3287263.0,430.808244


# Combined Old and New Row Data

In [None]:
#Lastly combine existing data with newly added data

pd.concat([daerah, daerah_new])

Unnamed: 0,Population,Area,pop_area
Indonesia,272229400.0,1904569.0,142.934896
Singapura,5612300.0,726.0,7730.440771
Thailand,69037510.0,513120.0,134.544576
Rusia,146877100.0,17098246.0,8.590185
Tiongkok,1416180000.0,9596961.0,147.565464
India,1416180000.0,3287263.0,430.808244
