# IRIS 데이터 분류

독립변수 : 꽃받침 너비(septal_width), 꽃받침 길이(septal_length), 꽃잎 너비(petal_width), 꽃잎 길이(petal_length)

종속변수 : 붓꽃 종(setosa, versicolor, virginica)



# loss categorical_crossentropy

2가지 crossentropy 사용 방법
- categorical_crossentropy
- sparse_categorical_crossentropy

## categorical_crossentropy
y의 값이 one hot encoding인 경우
```
1,0,0
0,1,0
0,0,1
```

출력 레이어 설정
```
model.add(Dense(3, activation="softmax")) # 출력 레이어
```

loss 설정
```
model.compile(..., loss='categorical_crossentropy')
```


## sparse_categorical_crossentropy
y의 값이 one hot encoding인 경우
```
0
1
2
```

출력 레이어 설정
```
model.add(Dense(3, activation="softmax")) # 출력 레이어. 1이 아니라 클래스 수 3
```

loss 설정
```
model.compile(..., loss='sparse_categorical_crossentropy')
```

Dense(n, activation="softmax"),loss = "categorical_crossentropy"

Dense(n, activation="softmax"),loss = "sparse_categorical_crossentropy"

이진분류 : Dense(1, activation="sigmoid"),loss = "binary_crossentropy"

## 활성화 함수 : sigmoid, softmax, logit

sigmoid(class = 2)는 activation에,

softmax(class = k)는 classification에 사용

다루는 클래스가 2개냐 K개냐로 차이


sigmoid와 logit 함수는 서로 역함수 관계

# 모듈 임포팅

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import optimizers
from tensorflow.keras.layers import Dense, Input

# 데이터 불러오기

In [2]:
!wget https://raw.githubusercontent.com/dhrim/MDC_2021/master/material/deep_learning/iris.csv 

--2022-01-02 11:52:18--  https://raw.githubusercontent.com/dhrim/MDC_2021/master/material/deep_learning/iris.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2720 (2.7K) [text/plain]
Saving to: ‘iris.csv’


2022-01-02 11:52:19 (48.1 MB/s) - ‘iris.csv’ saved [2720/2720]



In [3]:
!ls -al
!head iris.csv

total 20
drwxr-xr-x 1 root root 4096 Jan  2 11:52 .
drwxr-xr-x 1 root root 4096 Jan  2 11:49 ..
drwxr-xr-x 4 root root 4096 Dec  3 14:33 .config
-rw-r--r-- 1 root root 2720 Jan  2 11:52 iris.csv
drwxr-xr-x 1 root root 4096 Dec  3 14:33 sample_data
septal_length,septal_width,petal_length,petal_width,setosa,versicolor,virginica
6.4,2.8,5.6,2.2,0,0,1
5.0,2.3,3.3,1.0,0,1,0
4.9,2.5,4.5,1.7,0,0,1
4.9,3.1,1.5,0.1,1,0,0
5.7,3.8,1.7,0.3,1,0,0
4.4,3.2,1.3,0.2,1,0,0
5.4,3.4,1.5,0.4,1,0,0
6.9,3.1,5.1,2.3,0,0,1
6.7,3.1,4.4,1.4,0,1,0


In [4]:
iris = pd.read_csv("iris.csv")
iris.head()

Unnamed: 0,septal_length,septal_width,petal_length,petal_width,setosa,versicolor,virginica
0,6.4,2.8,5.6,2.2,0,0,1
1,5.0,2.3,3.3,1.0,0,1,0
2,4.9,2.5,4.5,1.7,0,0,1
3,4.9,3.1,1.5,0.1,1,0,0
4,5.7,3.8,1.7,0.3,1,0,0


In [5]:
iris.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 120 entries, 0 to 119
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   septal_length  120 non-null    float64
 1   septal_width   120 non-null    float64
 2   petal_length   120 non-null    float64
 3   petal_width    120 non-null    float64
 4   setosa         120 non-null    int64  
 5   versicolor     120 non-null    int64  
 6   virginica      120 non-null    int64  
dtypes: float64(4), int64(3)
memory usage: 6.7 KB


# 데이터 넘파이로 변환

In [6]:
data = iris.to_numpy()
print(data.shape)
print(data[:5])

(120, 7)
[[6.4 2.8 5.6 2.2 0.  0.  1. ]
 [5.  2.3 3.3 1.  0.  1.  0. ]
 [4.9 2.5 4.5 1.7 0.  0.  1. ]
 [4.9 3.1 1.5 0.1 1.  0.  0. ]
 [5.7 3.8 1.7 0.3 1.  0.  0. ]]


# x, y로 분리

train, test 데이터로 나누기

In [15]:
x = data[:,:4]
y = data[:,4:]

split_index = 100

train_x, test_x = x[:split_index], x[split_index:]
train_y, test_y = y[:split_index], y[split_index:]

In [16]:
print(train_x.shape)
print(train_y.shape)

(100, 4)
(100, 3)


# DNN 모델 정의

In [20]:
model = keras.Sequential()
model.add(Input(4))
model.add(Dense(10, activation='tanh'))
model.add(Dense(10, activation='tanh'))
model.add(Dense(3, activation='softmax'))  # class 수는 1이 아니고 3이다

In [21]:
# 컴파일 : 모델 + optimizer + loss
# categorical_crossentropy : 분류 문제인 경우 이거 써주는게 좋음 / 숫자 3개의 모양새가 더 중요

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()

In [22]:
# 학습
model.fit(train_x, train_y, epochs=200, verbose=0)

<keras.callbacks.History at 0x7f21a2ba4610>

In [23]:
# 성능 평가
loss, acc = model.evaluate(test_x, test_y)

print("loss : ", loss)
print("acc : ", acc)

loss :  0.06622647494077682
acc :  1.0


In [24]:
# 예측
y_ = model.predict(test_x)
print(y_)
print(np.argmax(y_, axis=1))

[[9.6089000e-01 3.8931318e-02 1.7874280e-04]
 [3.0971266e-02 9.5300895e-01 1.6019778e-02]
 [9.4530720e-01 5.4435868e-02 2.5691016e-04]
 [2.3648130e-02 9.5351011e-01 2.2841746e-02]
 [9.6108860e-01 3.8721010e-02 1.9047182e-04]
 [9.6443981e-01 3.5397124e-02 1.6300728e-04]
 [9.6551126e-01 3.4329023e-02 1.5979764e-04]
 [9.6575814e-01 3.4078483e-02 1.6340066e-04]
 [3.0117378e-02 9.5443225e-01 1.5450320e-02]
 [9.5858389e-01 4.1226067e-02 1.8999736e-04]
 [9.4673093e-03 2.1544579e-01 7.7508688e-01]
 [2.4839116e-02 9.1516370e-01 5.9997223e-02]
 [9.5665401e-01 4.3156222e-02 1.8981767e-04]
 [7.3967135e-04 9.2157898e-03 9.9004459e-01]
 [9.6028674e-01 3.9507184e-02 2.0610649e-04]
 [2.2942852e-02 7.6793820e-01 2.0911896e-01]
 [2.4398500e-02 9.3237752e-01 4.3223966e-02]
 [9.5962006e-01 4.0184107e-02 1.9589170e-04]
 [9.6154696e-01 3.8280454e-02 1.7245514e-04]
 [2.5617525e-02 9.5589024e-01 1.8492142e-02]]
[0 1 0 1 0 0 0 0 1 0 2 1 0 2 0 1 1 0 0 1]


# iris_dnn with category index

아래의 코드는 dnn_iris_and_optimizer.ipynb의 코드를 기반


# 데이터 불러오기

In [25]:
!wget https://raw.githubusercontent.com/dhrim/MDC_2021/master/material/deep_learning/iris_with_category_index.csv

--2022-01-02 14:32:10--  https://raw.githubusercontent.com/dhrim/MDC_2021/master/material/deep_learning/iris_with_category_index.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2218 (2.2K) [text/plain]
Saving to: ‘iris_with_category_index.csv’


2022-01-02 14:32:11 (24.7 MB/s) - ‘iris_with_category_index.csv’ saved [2218/2218]



In [26]:
!ls -al
!head iris_with_category_index.csv

total 24
drwxr-xr-x 1 root root 4096 Jan  2 14:32 .
drwxr-xr-x 1 root root 4096 Jan  2 11:49 ..
drwxr-xr-x 4 root root 4096 Dec  3 14:33 .config
-rw-r--r-- 1 root root 2720 Jan  2 11:52 iris.csv
-rw-r--r-- 1 root root 2218 Jan  2 14:32 iris_with_category_index.csv
drwxr-xr-x 1 root root 4096 Dec  3 14:33 sample_data
septal_length,septal_width,petal_length,petal_width,class
6.4,2.8,5.6,2.2,2
5.0,2.3,3.3,1.0,1
4.9,2.5,4.5,1.7,2
4.9,3.1,1.5,0.1,0
5.7,3.8,1.7,0.3,0
4.4,3.2,1.3,0.2,0
5.4,3.4,1.5,0.4,0
6.9,3.1,5.1,2.3,2
6.7,3.1,4.4,1.4,1


In [27]:
iris = pd.read_csv("iris_with_category_index.csv")
iris.head()

Unnamed: 0,septal_length,septal_width,petal_length,petal_width,class
0,6.4,2.8,5.6,2.2,2
1,5.0,2.3,3.3,1.0,1
2,4.9,2.5,4.5,1.7,2
3,4.9,3.1,1.5,0.1,0
4,5.7,3.8,1.7,0.3,0


# 데이터 넘파이로 변환

In [28]:
data = iris.to_numpy()
print(data.shape)
print(data[:5])

(120, 5)
[[6.4 2.8 5.6 2.2 2. ]
 [5.  2.3 3.3 1.  1. ]
 [4.9 2.5 4.5 1.7 2. ]
 [4.9 3.1 1.5 0.1 0. ]
 [5.7 3.8 1.7 0.3 0. ]]


# x, y로 분리
train, test 데이터로 나누기

In [29]:
x = data[:,:4]
y = data[:,4:]

split_index = 100

train_x, test_x = x[:split_index], x[split_index:]
train_y, test_y = y[:split_index], y[split_index:]

print(train_x.shape)
print(train_y.shape)
print(test_x.shape)
print(test_y.shape)

(100, 4)
(100, 1)
(20, 4)
(20, 1)


# DNN 모델 정의

In [30]:
model = keras.Sequential()
model.add(Input(4))
model.add(Dense(10, activation='tanh'))
model.add(Dense(10, activation='tanh'))
model.add(Dense(3, activation='softmax'))

# model.compile(optimizer="SGD", loss="categorical_crossentropy", metrics=["accuracy"])
model.compile(optimizer="SGD", loss="sparse_categorical_crossentropy", metrics=["accuracy"])
model.summary()

# 학습
model.fit(train_x, train_y, epochs=1000, verbose=0, batch_size=20)

# 성능 평가
loss, acc = model.evaluate(test_x, test_y)
print("loss=", loss)
print("acc=", acc)

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_9 (Dense)             (None, 10)                50        
                                                                 
 dense_10 (Dense)            (None, 10)                110       
                                                                 
 dense_11 (Dense)            (None, 3)                 33        
                                                                 
Total params: 193
Trainable params: 193
Non-trainable params: 0
_________________________________________________________________
loss= 0.02841911092400551
acc= 1.0


In [31]:
# 예측
y_ = model.predict(test_x)
print(y_)
print(np.argmax(y_, axis=1))

[[9.9329364e-01 6.6666580e-03 3.9628983e-05]
 [1.0090773e-02 9.8720634e-01 2.7028332e-03]
 [9.8827714e-01 1.1657994e-02 6.4856817e-05]
 [4.9821213e-03 9.9218810e-01 2.8297752e-03]
 [9.9318945e-01 6.7703058e-03 4.0263832e-05]
 [9.9440259e-01 5.5634724e-03 3.4065823e-05]
 [9.9465972e-01 5.3078677e-03 3.2400207e-05]
 [9.9464107e-01 5.3262580e-03 3.2647113e-05]
 [1.7302429e-02 9.8013395e-01 2.5635757e-03]
 [9.9234653e-01 7.6102279e-03 4.3148277e-05]
 [2.6980147e-04 2.2504829e-01 7.7468199e-01]
 [3.8195883e-03 9.8803955e-01 8.1408657e-03]
 [9.9208778e-01 7.8676604e-03 4.4538654e-05]
 [7.8208839e-05 6.4070061e-02 9.3585175e-01]
 [9.9300224e-01 6.9551286e-03 4.2666234e-05]
 [1.4845027e-03 9.0761197e-01 9.0903454e-02]
 [5.9467726e-03 9.8937654e-01 4.6766982e-03]
 [9.9240983e-01 7.5466880e-03 4.3580931e-05]
 [9.9346381e-01 6.4978707e-03 3.8284310e-05]
 [4.9634501e-03 9.9220699e-01 2.8296346e-03]]
[0 1 0 1 0 0 0 0 1 0 2 1 0 2 0 1 1 0 0 1]
