## sklearn.cluster.MeanShift
* _class_ sklearn.cluster.MeanShift(_*_,  _bandwidth=None_,  _seeds=None_,  _bin_seeding=False_,  _min_bin_freq=1_,  _cluster_all=True_,  _n_jobs=None_,  _max_iter=300_)[[source]](https://github.com/scikit-learn/scikit-learn/blob/8c9c1f27b/sklearn/cluster/_mean_shift.py#L263)[](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.MeanShift.html#sklearn.cluster.MeanShift "Permalink to this definition")

Mean shift clustering using a flat kernel.

* 
Mean shift clustering using a flat kernel.

Mean shift clustering aims to discover “blobs” in a smooth density of samples. It is a centroid-based algorithm, which works by updating candidates for centroids to be the mean of the points within a given region. These candidates are then filtered in a post-processing stage to eliminate near-duplicates to form the final set of centroids.

Seeding is performed using a binning technique for scalability.

Read more in the  [User Guide](https://scikit-learn.org/stable/modules/clustering.html#mean-shift).

In [1]:
import numpy as np
from sklearn.datasets import make_blobs
from sklearn.cluster import MeanShift

X, y = make_blobs(n_samples=200, n_features=2, centers=3,
                 cluster_std=0.7, random_state=0)

meanshift = MeanShift(bandwidth=0.8)
cluster_labels = meanshift.fit_predict(X)
print('cluster label 유형:', np.unique(cluster_labels))

cluster label 유형: [0 1 2 3 4 5]


In [5]:
meanshift = MeanShift(bandwidth=1.865)
cluster_labels = meanshift.fit_predict(X)
print('cluster label 유형:', np.unique(cluster_labels))

cluster label 유형: [0 1 2]


In [2]:
meanshift = MeanShift(bandwidth=1)
cluster_labels = meanshift.fit_predict(X)
print('cluster label 유형:', np.unique(cluster_labels))

cluster label 유형: [0 1 2]


최적의 bandwidth 값을 estimate_bandwidth()로 계산한 뒤 다시 군집화 수행

In [3]:
from sklearn.cluster import estimate_bandwidth

bandwidth = estimate_bandwidth(X)

print('bandwidth 값:', round(bandwidth, 3))

bandwidth 값: 1.816


In [4]:
import pandas as pd

cluster_df = pd.DataFrame(data = X, columns = ['ftr1', 'ftr2'])
cluster_df['target'] = y

# estimate_bandwidth()로 최적의 bandwidth 계산
best_bandwidth = estimate_bandwidth(X)

meanshift = MeanShift(bandwidth=best_bandwidth)
cluster_labels = meanshift.fit_predict(X)
print('cluster labels 유형:', np.unique(cluster_labels))

cluster labels 유형: [0 1 2]
