# 结合SKLearn实现的支持向量分类

[_Mark (Zixuan) Song_](https://marksong.tech) 撰写
- - -

为代码简洁，本示例结合了`sklearn`库中的`SVC`类，实现了支持向量分类。

# 概述

本示例的目的是将量子机器学习（QML）转换器嵌入到SVC管道中并且介绍`tensorcircuit`与`scikit-learn`的一种连接方式。

## 设置

安装`scikit-learn`。下载数据集[`GCN`](https://archive.ics.uci.edu/dataset/144/statlog+german+credit+data)并存储为`german.data-numeric`。

```bash
pip install scikit-learn
```

In [1]:
import tensorcircuit as tc
import tensorflow as tf
from sklearn.svm import SVC
from sklearn import metrics
from time import time

K = tc.set_backend("tensorflow")

## 数据处理

数据集包含24个变量，每个变量都是整数值。为了使模型能够使用数据，我们需要首先将数据转换为4x6或5x5的矩阵（本教程的情况），然后将数据归一化为0到1之间。

In [2]:
def load_GCN_data():
    f = open("german.data-numeric")
    line = f.readline()
    X = []
    while line:
        ll = line
        while '  ' in ll:
            ll = ll.replace('  ',' ')
        if ll[0]==' ':
            ll = ll[1:]
        if ll[-1]=='\n':
            ll = ll[:-1]
        if ll[-1]==' ':
            ll = ll[:-1]
        x = ll.split(' ')
        x_int = []
        for i in x:
            x_int.append(int(i))
        X.append(x_int)
        line = f.readline()
    f.close()
    X_temp = K.convert_to_tensor(X)
    X = []
    Y = []
    X_temp_transpose = K.transpose(K.convert_to_tensor(X_temp))
    X_temp_max = []
    for i in range(len(X_temp_transpose)):
        X_temp_max.append(max(X_temp_transpose[i]))
    X_temp_max = K.convert_to_tensor(X_temp_max)
    final_digit = K.cast([0],'int32')
    for i in X_temp:
        Y.append(i[-1]-1)
        X.append(K.divide(K.concat([i[:24],final_digit],0), X_temp_max))
    Y = K.cast(K.convert_to_tensor(Y),'float32')
    X = K.cast(K.convert_to_tensor(X),'float32')
    return (X[:800],Y[:800]),(X[800:],Y[800:])

(x_train, y_train), (x_test, y_test) = load_GCN_data()

## 量子模型

这个量子模型是输入为5x5矩阵，并输出为5个量子比特的状态。模型如下所示：

In [3]:
def quantumTran(inputs):
    c = tc.Circuit(5)
    for i in range(5):
        if i%2 == 0:
            for j in range(5):
                c.rx(j, theta=(0 if i*5+j >= 25 else inputs[i*5+j]))
            for j in range(4):
                c.cnot(j, j+1)
        else:
            for j in range(5):
                c.rz(j, theta=(0 if i*5+j >= 25 else inputs[i*5+j]))
    return c.state()

func_qt =  tc.interfaces.tensorflow_interface(quantumTran, ydtype=tf.complex64, jit=True)

## 将量子模型打包成SVC

将量子模型打包成`SKLearn`能使用的SVC模型。

In [4]:
def quantum_kernel(quantumTran, data_x, data_y):
    def kernel(x,y):
        x = K.convert_to_tensor(x)
        y = K.convert_to_tensor(y)
        x_qt = None
        for i, x1 in enumerate(x):
            if i == 0:
                x_qt = K.convert_to_tensor([quantumTran(x1)])
            else:
                x_qt = K.concat([x_qt,[quantumTran(x1)]],0)
        y_qt = None
        for i, x1 in enumerate(y):
            if i == 0:
                y_qt = K.convert_to_tensor([quantumTran(x1)])
            else:
                y_qt = K.concat([y_qt,[quantumTran(x1)]],0)
        data_ret = K.cast(K.power(K.abs(x_qt @ K.transpose(y_qt)), 2), "float32")
        return data_ret
    clf = SVC(kernel=kernel)
    clf.fit(data_x, data_y)
    return clf

## 创建传统SVC模型

In [5]:
def standard_kernel(data_x, data_y, method):
    methods = ['linear', 'poly', 'rbf', 'sigmoid']
    if method not in methods:
        raise ValueError("method must be one of %r." % methods)
    clf = SVC(kernel=method)
    clf.fit(data_x, data_y)
    return clf

## 测试对比

测试量子SVC模型并于传统SVC模型进行对比。

In [6]:
methods = ['linear', 'poly', 'rbf', 'sigmoid']

for method in methods:
    
    print()
    t = time()

    k = standard_kernel(data_x=x_train, data_y=y_train, method=method)
    y_pred = k.predict(x_test)
    print("Accuracy:(%s as kernel)" % method,metrics.accuracy_score(y_test, y_pred))

    print("time:",time()-t,'seconds')

print()
t = time()

k = quantum_kernel(quantumTran=func_qt, data_x=x_train, data_y=y_train)
y_pred = k.predict(x_test)
print("Accuracy:(qml as kernel)",metrics.accuracy_score(y_test, y_pred))

print("time:",time()-t,'seconds')


Accuracy:(linear as kernel) 0.79
time: 0.009594917297363281 seconds

Accuracy:(poly as kernel) 0.77
time: 0.010785818099975586 seconds

Accuracy:(rbf as kernel) 0.775
time: 0.012056112289428711 seconds

Accuracy:(sigmoid as kernel) 0.565
time: 0.017444133758544922 seconds

Accuracy:(qml as kernel) 0.635
time: 6.606667995452881 seconds


## `SKLearn`的局限性

因为`SKLearn`的局限性，`SKLearn`的`SVC`并不完全兼容量子机器学习（QML）。

这是因为QML输出的为复数（布洛赫球上的坐标），而`SKLearn`只接受浮点数。这导致QML输出的结果必须在使用SVC之前转换为浮点数，从而可能导致精度损失。

## 结论

由于`SKLearn`的局限性，量子SVC在准确性和速度上都不如传统SVC。但是，如果这种局限性被消除，量子SVC可能会在准确性上都优于传统SVC。