___

<p style="text-align: center;"><img src="https://docs.google.com/uc?id=1lY0Uj5R04yMY3-ZppPWxqCr5pvBLYPnV" class="img-fluid" 
alt="CLRSWY"></p>

## <p style="background-color:#FDFEFE; font-family:newtimeroman; color:#9d4f8c; font-size:100%; text-align:center; border-radius:10px 10px;">WAY TO REINVENT YOURSELF</p>

<img src=https://i.ibb.co/6gCsHd6/1200px-Pandas-logo-svg.png width="700" height="200">

## <p style="background-color:#FDFEFE; font-family:newtimeroman; color:#060108; font-size:200%; text-align:center; border-radius:10px 10px;">Data Analysis with Python</p>

## <p style="background-color:#FDFEFE; font-family:newtimeroman; color:#060108; font-size:150%; text-align:center; border-radius:10px 10px;">Session - 04</p>

## <p style="background-color:#FDFEFE; font-family:newtimeroman; color:#4d77cf; font-size:200%; text-align:center; border-radius:10px 10px;">Pandas DataFrames</p>

<a id="toc"></a>

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Content</p>

* [IMPORTING LIBRARIES NEEDED IN THIS NOTEBOOK](#0)
* [DATA FRAMES](#1)
* [CREATING A DATA FRAME](#2)
    * [Creating a DataFrame Using the Lists of Data & Columns](#2.1)
    * [Creating a DataFrame Using a Numpy Arrays](#2.2)
    * [Creating a DataFrame Using a Dictionary](#2.3)
    * [The Examination of Some Attributes on Data](#2.4)
* [INDEXING, SLICING & SELECTION](#3)    
* [CREATING A NEW COLUMN](#4)    
* [REMOVING COLUMNS](#5)
* [REMOVING ROWS](#6)
* [SELECTING ROWS & COLUMNS USING .loc[ ] & .iloc[ ] ](#7)
* [CONDITIONAL SELECTION](#8)
    * [One Conditional Statement](#8.1)
    * [Two or More Conditional Statements](#8.2)
    * [Conditional Selection Using .loc[ ]](#8.3)
* [reset_index() & set_index()](#9)
* [Multi-Index & Index Hierarchy](#10)
* [Some Other Useful Methods with Iris Dataset](#11)
* [THE END OF THE SESSION-04](#12)

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Importing Libraries Needed in This Notebook</p>

<a id="0"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

Once you've installed NumPy & Pandas you can import them as a library:

In [1]:
import numpy as np
import pandas as pd

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Data Frames</p>

<a id="1"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

A DataFrame is a two-dimensional data container, similar to a Matrix, but which can contain heterogeneous data, and for which symbolic names may be associated with the rows and columns. ``DataFrames`` are the workhorse of pandas and are directly inspired by the R programming language. We can think of a DataFrame as a bunch of Series objects put together to share the same index. 

### Why use Pandas?

Data scientists make use of Pandas in Python for its **following advantages**:

- Easily handles missing data
- It uses Series for one-dimensional data structure and DataFrame for multi-dimensional data structure
- It provides an efficient way to slice the data
- It provides a flexible way to merge, concatenate or reshape the data
- It includes a powerful time series tool to work with

In a nutshell, Pandas is a useful library in data analysis. It can be used to perform data manipulation and analysis. Pandas provide powerful and easy-to-use data structures, as well as the means to quickly perform operations on these structures.

[SOURCE01](https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html), 
[SOURCE02](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html), 
[SOURCE03](https://morioh.com/p/2528ac775b1b), 
[SOURCE04](https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python), 
[SOURCE05](https://www.guru99.com/python-pandas-tutorial.html), 
[SOURCE06](https://www.tutorialspoint.com/python_pandas/python_pandas_dataframe.htm), 
[SOURCE07](https://realpython.com/pandas-dataframe/) &
[SOURCE08](https://towardsdatascience.com/a-simple-guide-to-pandas-dataframes-b125f64e1453)<br>
[VIDEO SOURCE01](https://www.youtube.com/watch?v=zmdjNSmRXF4), 
[VIDEO SOURCE02](https://www.youtube.com/watch?v=F6kmIpWWEdU) &
[VIDEO SOURCE03](https://towardsdatascience.com/pandas-dataframe-basics-3c16eb35c4f3)<br>

**Now let's use pandas to explore this topic!**

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Creating a DataFrame</p>

<a id="2"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

A **``DataFrame``** is a **two-dimension collection of data**. It is a data structure where data is stored in **tabular form**. Datasets are arranged in rows and columns; we can store multiple datasets in the data frame. We can perform various arithmetic operations, such as adding column/row selection and columns/rows in the data frame.

We can import the DataFrames from the external storage; these storages can be referred to as the SQL Database, CSV file, and an Excel file. We can also use the lists, dictionary, and from a list of dictionary, etc.

In this session, we will learn to create the DataFrame in multiple ways. Let's understand these different ways.

**``pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=None)``**

### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:LEFT; border-radius:10px 10px;">Creating a DataFrame Using the Lists of Data & Columns</p>

<a id="2.1"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [3]:
data = [1, 2, 3, 4]
data

[1, 2, 3, 4]

In [4]:
pd.Series(data)

0    1
1    2
2    3
3    4
dtype: int64

In [6]:
pd.Series(data, name="FB")  # name, Series'in ismini tanimlar

0    1
1    2
2    3
3    4
Name: FB, dtype: int64

In [5]:
pd.DataFrame(data)  # 0,1,2,3 label olarak verdi. column tanimlanmadigi icin "0" ismini verdi

Unnamed: 0,0
0,1
1,2
2,3
3,4


In [8]:
pd.DataFrame(data, columns="column1")  # ',' den sonra ismini yazmadan(column) bir sey girsem 2. sirada index old. icin hata verir.
# simdiki hata sebebi column'u bir collection(list,tuple,set) icinde cagirmali

TypeError: Index(...) must be called with a collection of some kind, 'column1' was passed

In [11]:
pd.DataFrame(data, columns=["column1"])

Unnamed: 0,column1
0,1
1,2
2,3
3,4


### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:LEFT; border-radius:10px 10px;">Creating a DataFrame Using a Numpy Arrays</p>

<a id="2.2"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [35]:
birgül = np.arange(1, 24, 2).reshape(3, 4)
birgül

array([[ 1,  3,  5,  7],
       [ 9, 11, 13, 15],
       [17, 19, 21, 23]])

In [38]:
pd.DataFrame(data=birgül)

Unnamed: 0,0,1,2,3
0,1,3,5,7
1,9,11,13,15
2,17,19,21,23


In [39]:
pd.DataFrame(data=birgül, columns=["Pakize", "Sam", "Owen", "Oguzhan"])  # column'lara isim verdi

Unnamed: 0,Pakize,Sam,Owen,Oguzhan
0,1,3,5,7
1,9,11,13,15
2,17,19,21,23


### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:LEFT; border-radius:10px 10px;">Creating a DataFrame Using a Dictionary</p>

<a id="2.3"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [40]:
s1 = np.random.randint(2, 10, size=4)
s2 = np.random.randint(3, 10, size=4)
s3 = np.random.randint(4, 15, size=4)

In [41]:
s1

array([5, 2, 3, 8])

In [42]:
s2

array([3, 5, 6, 7])

In [43]:
s3

array([ 8, 13, 13,  6])

In [44]:
myDict = {"var1": s1, "var2": s2, "var3": s3}
myDict

{'var1': array([5, 2, 3, 8]),
 'var2': array([3, 5, 6, 7]),
 'var3': array([ 8, 13, 13,  6])}

In [45]:
df = pd.DataFrame(myDict)
df

Unnamed: 0,var1,var2,var3
0,5,3,8
1,2,5,13
2,3,6,13
3,8,7,6


### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:LEFT; border-radius:10px 10px;">The Examination of Some Attributes on Data</p>

<a id="2.4"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [46]:
df.head(2)

Unnamed: 0,var1,var2,var3
0,5,3,8
1,2,5,13


In [47]:
df.tail(2)

Unnamed: 0,var1,var2,var3
2,3,6,13
3,8,7,6


In [49]:
df.sample(2)  # getirdigi satirlar her calistirmada degisir

Unnamed: 0,var1,var2,var3
1,2,5,13
3,8,7,6


In [50]:
df.columns

Index(['var1', 'var2', 'var3'], dtype='object')

In [52]:
for i in df.columns:
    print(i)

var1
var2
var3


In [55]:
for i in df.columns:
    print(df[i].mean())

4.5
5.25
10.0


In [56]:
df.mean()

var1     4.50
var2     5.25
var3    10.00
dtype: float64

In [57]:
df.index

RangeIndex(start=0, stop=4, step=1)

In [58]:
[i for i in df.index]

[0, 1, 2, 3]

In [43]:
df.columns

Index(['var1', 'var2', 'var3'], dtype='object')

In [59]:
df.columns = ["new1", "new2", "new3"]

In [60]:
df

Unnamed: 0,new1,new2,new3
0,5,3,8
1,2,5,13
2,3,6,13
3,8,7,6


In [62]:
df.index = ["a", "b", "c", "d"]  # label index. backround'da numerical(0, 1, 2, 3) index var
df

Unnamed: 0,new1,new2,new3
a,5,3,8
b,2,5,13
c,3,6,13
d,8,7,6


In [63]:
df.rename(columns = {"new1": "a", "new2": "b"})  # kalici degistirmedi
# kalici degistirmesi icin ya en basta atama yapilmali yada inplace=True(default'u false) yazilmali

Unnamed: 0,a,b,new3
a,5,3,8
b,2,5,13
c,3,6,13
d,8,7,6


In [64]:
df.rename(index = {"a": 1, "b": 2})  # kalici degistirmedi
# kalici degistirmesi icin ya en basta atama yapilmali yada inplace=True(default'u false) yazilmali

Unnamed: 0,new1,new2,new3
1,5,3,8
2,2,5,13
c,3,6,13
d,8,7,6


In [65]:
df.shape  # 4*3

(4, 3)

In [66]:
df.shape[0]  # 4*3 den 4 gelir

4

In [67]:
df.shape[1]  # 4*3 den 3 gelir

3

In [69]:
len(df)  # rows'lar uzunluktur

4

In [70]:
df.ndim  # kac boyutlu

2

In [71]:
df.values  # array olarak geldi

array([[ 5,  3,  8],
       [ 2,  5, 13],
       [ 3,  6, 13],
       [ 8,  7,  6]])

In [72]:
type(df)  # DataFrame

pandas.core.frame.DataFrame

In [73]:
type(df.values)

numpy.ndarray

In [74]:
type(df["new1"])  # Series => df zaten Series'lerin yanyana gelmesi ile olusuyor

pandas.core.series.Series

In [75]:
"new2" in df

True

In [76]:
"Ali" in df

False

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Indexing, Slicing & Selection</p>

<a id="3"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

Let's learn a variety of methods to grab data from a DataFrame

In [77]:
from numpy.random import randn

In [78]:
'A B C D'.split()  # asagidaki kullanimin ciktisini görmek icin. rows'lari split yapti yani bosluklardan ayirdi

['A', 'B', 'C', 'D']

In [79]:
np.random.seed(101)  # sabitleme yapti. bu kodun calistigi her yerde ayni sonuc alinir

df = pd.DataFrame(randn(5, 4), index = 'A B C D E'.split(), columns = 'W X Y Z'.split())
df

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


In [80]:
df['Y']  # column'a ulastim

A    0.907969
B   -0.848077
C    0.528813
D   -0.933237
E    2.605967
Name: Y, dtype: float64

In [82]:
type(df['Y'])  # Y, Series'dir

pandas.core.series.Series

In [83]:
df[['Y']]  # 2. [köseli parantez ile df haline geldi]

Unnamed: 0,Y
A,0.907969
B,-0.848077
C,0.528813
D,-0.933237
E,2.605967


In [84]:
type(df[['Y']])  # DataFrame

pandas.core.frame.DataFrame

In [86]:
df.Y  # bu sekilde cagirmaya ""SQL Syntex"" denir ama önerilmez

A    0.907969
B   -0.848077
C    0.528813
D   -0.933237
E    2.605967
Name: Y, dtype: float64

In [87]:
df[['Y', 'Z']]

Unnamed: 0,Y,Z
A,0.907969,0.503826
B,-0.848077,0.605965
C,0.528813,-0.589001
D,-0.933237,0.955057
E,2.605967,0.683509


## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Creating a New Column</p>

<a id="4"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [78]:
df["new_column"] = df["X"] * df["Y"]  # sonuc vermezse kalici bir islemdir

In [79]:
df

Unnamed: 0,W,X,Y,Z,new_column
A,2.70685,0.628133,0.907969,0.503826,0.570325
B,0.651118,-0.319318,-0.848077,0.605965,0.270806
C,-2.018168,0.740122,0.528813,-0.589001,0.391387
D,0.188695,-0.758872,-0.933237,0.955057,0.708208
E,0.190794,1.978757,2.605967,0.683509,5.156577


## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Removing Columns</p>

<a id="5"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [82]:
df.drop("new_column", axis=1)  # yeni olusturdugum column'u sildi

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


In [83]:
df.drop("new_column", axis=1, inplace=True)  # yeni olusturdugum column'u kalici olarak sildi

In [84]:
df

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Removing Rows</p>

<a id="6"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [85]:
df.drop("C")  # output verdigine göre kalici degisim yapmadi

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


In [86]:
df

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


In [87]:
df_temp = df.drop("C")  # inplace=True yerine baska bir degiskene atayarak sildim

In [88]:
df_temp

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Selecting Rows and Columns using .loc[ ] and iloc[ ]</p>

<a id="7"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

#### `.loc[]` → allows us to select data using **labels** (names) of rows (index) & columns

#### `.iloc[]` → allows us to select data using **index numbers** of rows (index) & columns. it's like classical indexing logic

In [89]:
data = np.random.randint(1, 40, size=(8, 4))
df = pd.DataFrame(data, columns = ["var1", "var2", "var3", 'var4'])
df

Unnamed: 0,var1,var2,var3,var4
0,8,11,39,10
1,19,8,16,1
2,13,18,12,16
3,34,30,25,37
4,20,36,31,11
5,21,28,9,23
6,27,24,38,23
7,10,3,19,29


In [None]:
df.loc[4]  # Series olarak geldi. axis=0 üzerinde calisiyor

In [None]:
df.loc[[4]]  # DataFrame olarak geldi

In [None]:
df.loc[2:5]  # label index üzerinde calisir bu yüzden 2., 3. , 4. ve 5. rowslari getirdi

In [None]:
df.iloc[2:5]  # numerical index üzerinde calisir, bu yüzden 2., 3., ve 4. rowslari getirdi

In [None]:
df.index = 'a b c d e f g h'.split()
df

In [None]:
df.iloc[1:4]

In [None]:
# df.loc[1:4]  # gives error

In [None]:
df.loc['c':'g']

In [None]:
df.loc['d','var3']

In [None]:
df.iloc[3, 2]

In [None]:
df.loc['d':'g', 'var2']

In [None]:
df.loc['d':'g']['var3']   # Series olarak geldi

In [None]:
# how can we select these data as a DataFrame not a series
# First way

df.loc['d':'g'][['var3']]

In [None]:
# Second way

df.loc['d':'g', ["var3"]]

In [None]:
df.loc['d':'g'][["var2","var3"]]

In [None]:
df.iloc[2:5, 2]

In [None]:
df.iloc[2:5, [2]]

In [None]:
#df.iloc[2:5][[2]] gives error

In [None]:
df.iloc[2:5][['var3']]

In [None]:
df

In [None]:
df.loc['a','var1']

In [None]:
# let's select the same data as a DataFrame

df.loc[['a'], ['var1']]

In [None]:
df.loc[['a','c'],['var1','var3']]

In [None]:
df.iloc[[0, 2], [0, 2]]

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Conditional Selection</p>

<a id="8"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

An important feature of pandas is conditional selection using bracket notation, very similar to numpy:

In [None]:
df

### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:LEFT; border-radius:10px 10px;">One Conditional Statement</p>

<a id="8.1"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [None]:
df > 10  # boolen olarak döner

In [None]:
df[df > 10]  # df'de 10dan büyükler gelir

In [None]:
df[df['var1'] > 10]  # var1'de 10dan büyükler

In [None]:
df[df['var1'] > 10]['var2']  # var1'de 10dan büyüklerin oldugu satirlar icin var2'deki degerleri getir 

In [None]:
df[df['var1'] > 10][['var2', "var3"]]  # var1'de 10dan büyüklerin oldugu satirlar icin hem var2'deki hem de var3'deki degerleri getir 

### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:LEFT; border-radius:10px 10px;">Two or More Conditional Statements</p>

<a id="8.2"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

**For two or more conditions, you can use | → or, & → and with parenthesis:**

In [None]:
df[(df['var1'] > 10) & (df['var1'] < 20)]  #  2 veya daha fazla conditions kullanilabilir. | → or, & → and

### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:LEFT; border-radius:10px 10px;">Conditional Selection Using .loc[ ] and .iloc[ ]</p>

<a id="8.3"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [None]:
df.loc[(df["var1"] > 10), ['var2', 'var3']]

In [None]:
df.loc[((df["var1"] < 10) | (df["var1"] > 30)), ['var2','var3']]

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">reset_index() & set_index()</p>

<a id="9"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

Let's discuss some more features of indexing, including resetting the index or setting it something else. We'll also talk about index hierarchy!

In [None]:
df

In [None]:
# Reset to default 0, 1...n index

df.reset_index() # reset öncesi olan index'leri de getirdi

In [None]:
df.reset_index(drop=True)  # reset öncesi olan index'leri getirmesin diye drop=True

In [None]:
df.reset_index(drop=True, inplace=True)  # kalici degisiklik icin de inplace=True
df

In [None]:
df.set_index('var4')

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Multi-Index & Index Hierarchy</p>

<a id="10"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

Let us go over how to work with Multi-Index, first we'll create a quick example of what a Multi-Indexed DataFrame would look like:

In [90]:
outside = ['M1', 'M1', 'M1', 'M2', 'M2', 'M2','M3', 'M3', 'M3']
inside = [1, 2, 3, 1, 2, 3, 5, 6, 7]
multi_index = list(zip(outside, inside))
multi_index

[('M1', 1),
 ('M1', 2),
 ('M1', 3),
 ('M2', 1),
 ('M2', 2),
 ('M2', 3),
 ('M3', 5),
 ('M3', 6),
 ('M3', 7)]

In [91]:
hier_index = pd.MultiIndex.from_tuples(multi_index)
hier_index

MultiIndex([('M1', 1),
            ('M1', 2),
            ('M1', 3),
            ('M2', 1),
            ('M2', 2),
            ('M2', 3),
            ('M3', 5),
            ('M3', 6),
            ('M3', 7)],
           )

In [92]:
np.random.seed(101)
df = pd.DataFrame(np.random.randn(9, 4), index = hier_index, columns=['A', 'B', 'C', 'D'])
df

Unnamed: 0,Unnamed: 1,A,B,C,D
M1,1,2.70685,0.628133,0.907969,0.503826
M1,2,0.651118,-0.319318,-0.848077,0.605965
M1,3,-2.018168,0.740122,0.528813,-0.589001
M2,1,0.188695,-0.758872,-0.933237,0.955057
M2,2,0.190794,1.978757,2.605967,0.683509
M2,3,0.302665,1.693723,-1.706086,-1.159119
M3,5,-0.134841,0.390528,0.166905,0.184502
M3,6,0.807706,0.07296,0.638787,0.329646
M3,7,-0.497104,-0.75407,-0.943406,0.484752


**``Note``** that all of the MultiIndex constructors accept a names argument which stores string names for the levels themselves. If no names are provided, None will be assigned:

For more information Indexing and Selecting Data, visit [**Pandas Official Documentation**](https://pandas.pydata.org/pandas-docs/version/0.13.0/indexing.html)

In [None]:
df.index  # indexler geldi. birisi M1,M2,M3 olan sütun digeri 1,2,3,1,2,3,5,6,7 olan sütun

In [94]:
df.index.names  # bu iki sütunun ismi olmadigi icin None olarak geldiler

FrozenList([None, None])

In [95]:
df.index.names = ["Group", "Num"]  # index'leri isimlendirdim
df.index.names

In [None]:
df  # artik isimleriyle geldi

In [96]:
df.index.levels  # unique olarak index isimlerini getirdi

FrozenList([['M1', 'M2', 'M3'], [1, 2, 3, 5, 6, 7]])

In [97]:
df.index.get_level_values

<bound method MultiIndex.get_level_values of MultiIndex([('M1', 1),
            ('M1', 2),
            ('M1', 3),
            ('M2', 1),
            ('M2', 2),
            ('M2', 3),
            ('M3', 5),
            ('M3', 6),
            ('M3', 7)],
           names=['Group', 'Num'])>

In [None]:
df.index.get_level_values(0)

In [None]:
df.index.get_level_values("Group")

In [None]:
df.index.get_level_values(1)

Now let's show how to index this! For index hierarchy we use ``df.loc[]``, if this was on the columns axis, you would just use normal bracket notation ``df[]``. Calling one level of the index returns the sub-dataframe:

In [None]:
df[["A"]]  # [] olunca Series olarak, [[]] olunca DataFrame olarak verdi

In [None]:
df[["A","B"]]  # bir den fazla columns olunca tek [] ile hata verir

In [None]:
df.loc['M1']

In [None]:
df.loc[("M1", 2)]  # Series 

In [None]:
df.loc[[("M1", 2)]]  # DataFrame

In [None]:
df.loc["M1", "A":"C"]

In [None]:
df.loc[[("M1", 2)], "A":"C"]

In [None]:
df.loc["M1":"M2"]

In [None]:
df.loc[("M1", 2):("M2", 3)]

More information for Multiindex and Advanced Indexing, visit [**Pandas Official Documentation**](https://pandas.pydata.org/docs/user_guide/advanced.html)

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Some Other Useful Methods with Iris Dataset</p>

<a id="11"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

### Let's apply functions/attributes/methods we have learnt for "iris dataset" 

In [98]:
import seaborn as sns

In [99]:
sns.get_dataset_names()  # seaborn'da gömülü dataset'ler

['anagrams',
 'anscombe',
 'attention',
 'brain_networks',
 'car_crashes',
 'diamonds',
 'dots',
 'exercise',
 'flights',
 'fmri',
 'gammas',
 'geyser',
 'iris',
 'mpg',
 'penguins',
 'planets',
 'taxis',
 'tips',
 'titanic']

In [103]:
df = sns.load_dataset('iris')  # iris datasetini yükledim
df

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,virginica
146,6.3,2.5,5.0,1.9,virginica
147,6.5,3.0,5.2,2.0,virginica
148,6.2,3.4,5.4,2.3,virginica


In [104]:
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [105]:
df.tail()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
145,6.7,3.0,5.2,2.3,virginica
146,6.3,2.5,5.0,1.9,virginica
147,6.5,3.0,5.2,2.0,virginica
148,6.2,3.4,5.4,2.3,virginica
149,5.9,3.0,5.1,1.8,virginica


In [106]:
df.shape

(150, 5)

In [107]:
df.ndim

2

In [108]:
df.sample(15)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
30,4.8,3.1,1.6,0.2,setosa
66,5.6,3.0,4.5,1.5,versicolor
117,7.7,3.8,6.7,2.2,virginica
96,5.7,2.9,4.2,1.3,versicolor
109,7.2,3.6,6.1,2.5,virginica
93,5.0,2.3,3.3,1.0,versicolor
74,6.4,2.9,4.3,1.3,versicolor
55,5.7,2.8,4.5,1.3,versicolor
72,6.3,2.5,4.9,1.5,versicolor
44,5.1,3.8,1.9,0.4,setosa


In [None]:
df.info()

In [None]:
df.describe()  # describe, numerical column'larin discrete istatistiklerini verir

In [None]:
df.describe(include="all")  # include="all" ekleyince kategorik verileri de tabloya alip özelliklerini verdi

In [None]:
df.describe(include="object")  # include="object" diyerek sadece kategorik verilerin özellikleri gelir

In [None]:
df.describe().transpose()  # veya df.describe().T ile index-column yerleri degisir

In [None]:
df.corr()

In [None]:
df.corr()[["sepal_length"]]

In [None]:
df['petal_length'].corr(df["petal_width"])

In [None]:
df.species.value_counts  # default'u (dropna=False)

In [None]:
df.species.unique()  # value_counts, unique degerlei getirdi(arka planda calisma sekli)

In [None]:
df.species.nunique()  # unique

In [None]:
df.loc[df["species"] == "setosa", "sepal_length"]

In [None]:
df[(df.sepal_length > 4) & (df.sepal_length < 5)]

In [None]:
df[(df.species == "virginica") & (df.sepal_length > 4)  & (df.sepal_length < 5)]

In [None]:
df.sort_values(by='sepal_length', ascending=True)  # by='sepal_length' ile o column'a göre artan olarak siraladi

## <p style="background-color:#FDFEFE; font-family:newtimeroman; color:#9d4f8c; font-size:150%; text-align:center; border-radius:10px 10px;">The End of The Session - 04</p>

<a id="12"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

<p style="text-align: center;"><img src="https://docs.google.com/uc?id=1lY0Uj5R04yMY3-ZppPWxqCr5pvBLYPnV" class="img-fluid" 
alt="CLRSWY"></p>

## <p style="background-color:#FDFEFE; font-family:newtimeroman; color:#9d4f8c; font-size:100%; text-align:center; border-radius:10px 10px;">WAY TO REINVENT YOURSELF</p>

________