# 머신러닝 - 로지스틱 회귀 (Logistic Regression)
  
**2019-2023 [FinanceData.KR]()**

## Logistic Function (로지스틱 함수)
Logistic Function은 Sigmoid function, Squashing function (Large input, Small output)




In [None]:
import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(-10, 10, 1000)
plt.plot(x, 1/(1+np.exp(-x)), label="logistic")
plt.legend(loc=2)
plt.grid(True)
plt.show()

1. 출력 범위: 0 ~ 1 (<u>**확률값**</u>)
1. 단조증가 함수
1. 이진분류 (threshold = 일반적으로 0.5)


## LogisticRegression

In [None]:
from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
x = [[-2,-3], [-5, -10], [10,3], [9,6]]
y = [1,1,0,0]

model.fit(x,y)
print(model.predict([[-1,-4]]))

In [None]:
print(model.coef_)
print(model.intercept_)

## 로지스틱 회귀 예제 #1 - 가상데이터

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.linear_model import LogisticRegression

# 가상의 데이터 생성
X, y = make_blobs(n_samples=150, n_features=2, centers=2, cluster_std=1, shuffle=True, random_state=0)

In [None]:
# 처음 5건 살펴보기 (1)
print(X[:5])
print(y[:5])

In [None]:
# 처음 5건 살펴보기 (2)
import pandas as pd

df = pd.DataFrame(np.c_[X, y], columns=['x1', 'x2', 'y'])
df[:5] # 처음 5건

In [None]:
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()
lr.fit(X, y)

print('Score: ', lr.score(X, y))

In [None]:
## 예측해보기
lr.predict(X[:5])

In [None]:
## 확률값 확인(1)
lr.predict_proba(X[:5])

In [None]:
## 확률값 확인(2)
df = pd.DataFrame(lr.predict_proba(X), columns=['Class 0', 'Class 1'])
df[:5] # 처음 5건

In [None]:
## 계수 확인
lr.coef_, lr.intercept_

In [None]:
# 결정 경계를 시각화
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01), np.arange(y_min, y_max, 0.01))
Z = lr.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.contourf(xx, yy, Z, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, marker='o', edgecolors='k')
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.title("Logistic Regression Decision Boundary")
plt.show()

## 전체 코드

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.linear_model import LogisticRegression

# 데이터 생성
X, y = make_blobs(n_samples=150, n_features=2, centers=2, cluster_std=1, shuffle=True, random_state=0)

# Logistic Regression 모델 훈련
model = LogisticRegression()
model.fit(X, y)

# 결정 경계 시각화를 위한 그리드 생성
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01), np.arange(y_min, y_max, 0.01))
Z = lr.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.contourf(xx, yy, Z, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, marker='o', edgecolors='k')
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.title("Logistic Regression Decision Boundary")
plt.show()

## 불확실성 계측

결정함수 decision_function를 통해 결정값(decisions)을 얻습니다.

```
decisions = X * coef + intercept
```

결정값(decisions)은 클래스 1에 속한다고 믿는 정도이며 부호만 판별합니다 (0보다 크면 양성, 0보다 작으면 음성이)

In [None]:
## 결정함수 decision_function를 통해 결정값(decisions)을 얻습니다
decisions = model.decision_function(X[:5])
decisions

확률값 확인 (양성클래스 확률)
* expit (aka 로지스틱 함수 또는 시그모이드 함수)
* expit(x) = 1/(1+exp(-x))

In [None]:
import scipy

scipy.special.expit(decisions)

**2019-2023 [FinanceData.KR]()**