## 使用SVC对cars.txt进行分析

这是一个关于汽车测评的数据集，类别变量为汽车的测评，
（unacc，ACC，good，vgood）分别代表（不可接受，可接受，好，非常好），

而6个属性变量分别为「买入价」，「维护费」，「车门数」，「可容纳人数」，「后备箱大小」，「安全性」。
值得一提的是6个属性变量全部是有序类别变量，比如「可容纳人数」值可为「2，4，more」，「安全性」值可为「low, med, high」


字段分别表示: 「买入价」，「维护费」，「车门数」，「可容纳人数」，「后备箱大小」，「安全性」

字段分别表示: price、maint、doors、persons、lug_boot、safty、recommend 建议

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.svm import SVC

In [2]:
# 字段分别表示: 「买入价」，「维护费」，「车门数」，「可容纳人数」，「后备箱大小」，「安全性」
# 字段分别表示: price、maint、doors、persons、lug_boot、safty、recommend 建议
df = pd.read_csv('./cars.txt',header=None)
df.columns=['price','maint','doors','persons','lug_boot','safty','recommend']
df

Unnamed: 0,price,maint,doors,persons,lug_boot,safty,recommend
0,vhigh,vhigh,2,2,small,low,unacc
1,vhigh,vhigh,2,2,small,med,unacc
2,vhigh,vhigh,2,2,small,high,unacc
3,vhigh,vhigh,2,2,med,low,unacc
4,vhigh,vhigh,2,2,med,med,unacc
...,...,...,...,...,...,...,...
1723,low,low,5more,more,med,med,good
1724,low,low,5more,more,med,high,vgood
1725,low,low,5more,more,big,low,unacc
1726,low,low,5more,more,big,med,good


#### 查看类型把string 转换成 int

In [3]:
data = df.iloc[:,:-1].copy()
data

Unnamed: 0,price,maint,doors,persons,lug_boot,safty
0,vhigh,vhigh,2,2,small,low
1,vhigh,vhigh,2,2,small,med
2,vhigh,vhigh,2,2,small,high
3,vhigh,vhigh,2,2,med,low
4,vhigh,vhigh,2,2,med,med
...,...,...,...,...,...,...
1723,low,low,5more,more,med,med
1724,low,low,5more,more,med,high
1725,low,low,5more,more,big,low
1726,low,low,5more,more,big,med


In [4]:
target = df.recommend.copy()
target

0       unacc
1       unacc
2       unacc
3       unacc
4       unacc
        ...  
1723     good
1724    vgood
1725    unacc
1726     good
1727    vgood
Name: recommend, Length: 1728, dtype: object

In [5]:
for col in data.columns:
    data[col]=data[col].factorize()[0]
data    

Unnamed: 0,price,maint,doors,persons,lug_boot,safty
0,0,0,0,0,0,0
1,0,0,0,0,0,1
2,0,0,0,0,0,2
3,0,0,0,0,1,0
4,0,0,0,0,1,1
...,...,...,...,...,...,...
1723,3,3,3,2,1,1
1724,3,3,3,2,1,2
1725,3,3,3,2,2,0
1726,3,3,3,2,2,1


#### 划分数据集

In [6]:
from sklearn.model_selection import train_test_split

In [7]:
x_train,x_test,y_train,y_test = train_test_split(data,target,test_size=0.2)

#### 使用SVC

In [8]:
from sklearn.model_selection import GridSearchCV

In [9]:
svc = SVC()
param_grid = {
    'C': [0.1, 1.0, 10,100,200],   
    'gamma': [0.01, 0.05, 0.1, 0.5,1],
}

gv = GridSearchCV(estimator=svc, param_grid=param_grid, n_jobs=-1, cv=5)

In [16]:
gv.fit(x_train, y_train)

In [17]:
# 最佳参数
gv.best_params_

{'C': 10, 'gamma': 0.5}

In [24]:
# 最佳得分
gv.best_score_

0.9898681525663161

In [20]:
# 最佳模型
best_svc = gv.best_estimator_

#### 交叉表查看

In [26]:
y_pred = best_svc.predict(x_test)

In [28]:
pd.crosstab(
    index=y_pred
    ,columns=y_test
    ,rownames=['测试']
    ,colnames=['真实']    
)

真实,acc,good,unacc,vgood
测试,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
acc,76,0,0,1
good,3,7,0,0
unacc,2,0,240,0
vgood,0,0,0,17
