### sklearn.preprocessing.LabelEncoder
**_class_ sklearn.preprocessing.LabelEncoder**[[source]](https://github.com/scikit-learn/scikit-learn/blob/ff1023fda/sklearn/preprocessing/_label.py#L36)[¶](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html#sklearn.preprocessing.LabelEncoder "Permalink to this definition")

Encode target labels with value between 0 and n_classes-1.

This transformer should be used to encode target values,  _i.e._  `y`, and not the input  `X`.

Read more in the  [User Guide](https://scikit-learn.org/stable/modules/preprocessing_targets.html#preprocessing-targets).

In [1]:
from sklearn.preprocessing import LabelEncoder

items = ['TV', '냉장고', '전자렌지', '컴퓨터', '선풍기', '선풍기', '믹서', '믹서']

LE = LabelEncoder()
LE.fit(items)
labels = LE.transform(items)
print('인코딩 변환값', labels)

인코딩 변환값 [0 1 4 5 3 3 2 2]


In [2]:
LE.classes_

array(['TV', '냉장고', '믹서', '선풍기', '전자렌지', '컴퓨터'], dtype='<U4')

In [5]:
from sklearn.preprocessing import LabelEncoder

items = ['TV', '냉장고', '전자렌지', '컴퓨터', '선풍기', '선풍기', '믹서', '믹서']

LE = LabelEncoder()

labels = LE.fit_transform(items)
print('인코딩 변환값', labels)

인코딩 변환값 [0 1 4 5 3 3 2 2]


In [6]:
LE.inverse_transform([0,0,1])

array(['TV', 'TV', '냉장고'], dtype='<U4')

### sklearn.preprocessing.OneHotEncoder
**_class_ sklearn.preprocessing.OneHotEncoder(_*_,  _categories='auto'_,  _drop=None_,  _sparse=True_,  _dtype=<class  'numpy.float64'>_,  _handle_unknown='error'_,  _min_frequency=None_,  _max_categories=None_)[[source]](https://github.com/scikit-learn/scikit-learn/blob/f3f51f9b6/sklearn/preprocessing/_encoders.py#L201)[](https://scikit-learn.org/1.1/modules/generated/sklearn.preprocessing.OneHotEncoder.html?highlight=onehotencoding#sklearn.preprocessing.OneHotEncoder "Permalink to this definition")

Encode categorical features as a one-hot numeric array.

The input to this transformer should be an array-like of integers or strings, denoting the values taken on by categorical (discrete) features. The features are encoded using a one-hot (aka ‘one-of-K’ or ‘dummy’) encoding scheme. This creates a binary column for each category and returns a sparse matrix or dense array (depending on the  `sparse`  parameter)

By default, the encoder derives the categories based on the unique values in each feature. Alternatively, you can also specify the  `categories`  manually.

This encoding is needed for feeding categorical data to many scikit-learn estimators, notably linear models and SVMs with the standard kernels.

Note: a one-hot encoding of y labels should use a LabelBinarizer instead.

Read more in the  [User Guide](https://scikit-learn.org/1.1/modules/preprocessing.html#preprocessing-categorical-features).

In [7]:
from sklearn.preprocessing import OneHotEncoder
import numpy as np

X = np.array([["a"] * 5 + ["b"] * 20 + ["c"] * 10 + ["d"] * 3], dtype=object).T
ohe = OneHotEncoder(max_categories=3, sparse=False).fit(X)
ohe.infrequent_categories_



[array(['a', 'd'], dtype=object)]

In [9]:
ohe.transform([["a"], ["b"]])

array([[0., 0., 1.],
       [1., 0., 0.]])

In [10]:
ohe_1 = OneHotEncoder(max_categories=3).fit(X)
ohe_1.transform([["a"], ["b"]])

<2x3 sparse matrix of type '<class 'numpy.float64'>'
	with 2 stored elements in Compressed Sparse Row format>

In [15]:
ohe_2 = OneHotEncoder(sparse=False ,handle_unknown='ignore' ).fit(X)
ohe_2.transform([["a"], ["b"] ,["f"]])



array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 0., 0.]])

In [29]:
from sklearn.preprocessing import OneHotEncoder
import numpy as np
items =np.array( ['TV', '냉장고', '전자렌지', '컴퓨터', '선풍기', '선풍기', '믹서', '믹서']).reshape(-1,1)

# items = np.array(items)
# print(items.shape)
# items = np.array(items).reshape(-1,1)
# print(items.shape)
oh_encoder= OneHotEncoder(sparse=False)
result = oh_encoder.fit_transform(items)



In [30]:
result

array([[1., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 1.],
       [0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0., 0.],
       [0., 0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0.]])

In [32]:
import pandas as pd 
pd.DataFrame(result , columns=['TV', '냉장고', '전자렌지', '컴퓨터', '선풍기', '믹서'])

Unnamed: 0,TV,냉장고,전자렌지,컴퓨터,선풍기,믹서
0,1.0,0.0,0.0,0.0,0.0,0.0
1,0.0,1.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,1.0,0.0
3,0.0,0.0,0.0,0.0,0.0,1.0
4,0.0,0.0,0.0,1.0,0.0,0.0
5,0.0,0.0,0.0,1.0,0.0,0.0
6,0.0,0.0,1.0,0.0,0.0,0.0
7,0.0,0.0,1.0,0.0,0.0,0.0
