## Description:
> 这个项目是手写数字识别， 也是基于KNN<br>
> 数字识别的数据，是一个像素矩阵，所以需要对这个矩阵进行处理，转换成向量
>
> 首先，先通过自己编程实习KNN，然后通过调用sklearn包来实现

### 导入用到的包 

In [10]:
from os import listdir
import numpy as np


### 导入数据集
> 这次的数据集是存放在两个子目录，一个训练集，一个测试集， 训练集包好了大约2000个例子， 测试集大约900个样本
> 
> 为了符合numpy处理的数据，需要先把图片的像素矩阵转成一维向量, 由于每一张图片都是32*32的，所以一维向量的大小应该是1*1024

In [15]:
# 把图片转换成向量表示

def img2Vector(filename):
    returnVect = np.zeros((1, 1024))
    fr = open(filename)
    for i in range(32):
        lineStr = fr.readline()
        for j in range(32):
            returnVect[0, 32*i+j] = int(lineStr[j])
    return returnVect

In [12]:
# 实现K近邻算法
def KNN(inX, dataSet, labels, k):
    """
    :param inX:  用于分类的输入向量
    :param dataSet: 输入的训练集样本 ,一个二维矩阵， 行代表训练样本的个数， 列代表特征
    :param labels: 标签向量
    :param k: 用于选择最近邻居的数目
    :return: 输入向量的类别
    """
    dataSetSize = dataSet.shape[0]               # 获取样本个数
    diffMat = inX - dataSet            # python广播机制， 求输入向量与各个训练样本的距离
    sqDiffMat = np.square(diffMat)
    sqDistances = sqDiffMat.sum(axis=1)
    distances = np.sqrt(sqDistances)
    #print(distances)
    sortedDistIndics = distances.argsort()    # 从小到大的索引排序， 因为要拿出距离最下的前k个
    #print(sortedDistIndics)
    classCount = {}
    for i in range(k):
        voteIlabel = labels[sortedDistIndics[i]]
        classCount[voteIlabel] = classCount.get(voteIlabel, 0) + 1
   # print(classCount)
    sortedClassCount = sorted(classCount, key=classCount.__getitem__, reverse=True)  #  字典排序，按照值进行键的从大到小排序
    #print(sortedClassCount)

    return sortedClassCount[0]

In [21]:
# 手写数字识别系统的测试代码
def handwritingClassTest():
    hwLabels = []
    trainingFileList = listdir('digits/trainingDigits')
    m = len(trainingFileList)
    trainingMat = np.zeros((m, 1024))
    for i in range(m):
        fileNameStr = trainingFileList[i]
        fileStr = fileNameStr.split('.')[0]
        classNumStr = int(fileStr.split('_')[0])
        hwLabels.append(classNumStr)
        trainingMat[i,:] = img2Vector('digits/trainingDigits/%s' % fileNameStr)
    
    testFileList = listdir('digits/testDigits')
    errorCount = 0.0
    mTest = len(testFileList)
    for i in range(mTest):
        fileNameStr = testFileList[i]
        fileStr = fileNameStr.split('.')[0]
        classNumStr = int(fileStr.split('_')[0])
        vectorUnderTest = img2Vector('digits/testDigits/%s' % fileNameStr)
        classifierResult = KNN(vectorUnderTest, trainingMat, hwLabels, 3)
        print("the classifier came back with:%d, the real answer is:%d" % (classifierResult, classNumStr))
        if (classifierResult != classNumStr):
            errorCount += 1.0
    print("\ntotal number of errors is : %d" % errorCount)
    print("\nthe total error rate is : %f" % (errorCount/float(mTest)))

In [22]:
if __name__ == "__main__":
    handwritingClassTest()

the classifier came back with:0, the real answer is:0
the classifier came back with:0, the real answer is:0
the classifier came back with:0, the real answer is:0
the classifier came back with:0, the real answer is:0
the classifier came back with:0, the real answer is:0
the classifier came back with:0, the real answer is:0
the classifier came back with:0, the real answer is:0
the classifier came back with:0, the real answer is:0
the classifier came back with:0, the real answer is:0
the classifier came back with:0, the real answer is:0
the classifier came back with:0, the real answer is:0
the classifier came back with:0, the real answer is:0
the classifier came back with:0, the real answer is:0
the classifier came back with:0, the real answer is:0
the classifier came back with:0, the real answer is:0
the classifier came back with:0, the real answer is:0
the classifier came back with:0, the real answer is:0
the classifier came back with:0, the real answer is:0
the classifier came back wit

the classifier came back with:1, the real answer is:1
the classifier came back with:1, the real answer is:1
the classifier came back with:1, the real answer is:1
the classifier came back with:1, the real answer is:1
the classifier came back with:1, the real answer is:1
the classifier came back with:1, the real answer is:1
the classifier came back with:1, the real answer is:1
the classifier came back with:1, the real answer is:1
the classifier came back with:1, the real answer is:1
the classifier came back with:1, the real answer is:1
the classifier came back with:1, the real answer is:1
the classifier came back with:7, the real answer is:1
the classifier came back with:1, the real answer is:1
the classifier came back with:1, the real answer is:1
the classifier came back with:1, the real answer is:1
the classifier came back with:1, the real answer is:1
the classifier came back with:1, the real answer is:1
the classifier came back with:1, the real answer is:1
the classifier came back wit

the classifier came back with:3, the real answer is:3
the classifier came back with:3, the real answer is:3
the classifier came back with:3, the real answer is:3
the classifier came back with:3, the real answer is:3
the classifier came back with:3, the real answer is:3
the classifier came back with:3, the real answer is:3
the classifier came back with:3, the real answer is:3
the classifier came back with:3, the real answer is:3
the classifier came back with:3, the real answer is:3
the classifier came back with:3, the real answer is:3
the classifier came back with:3, the real answer is:3
the classifier came back with:3, the real answer is:3
the classifier came back with:3, the real answer is:3
the classifier came back with:3, the real answer is:3
the classifier came back with:3, the real answer is:3
the classifier came back with:3, the real answer is:3
the classifier came back with:3, the real answer is:3
the classifier came back with:3, the real answer is:3
the classifier came back wit

the classifier came back with:5, the real answer is:5
the classifier came back with:5, the real answer is:5
the classifier came back with:5, the real answer is:5
the classifier came back with:5, the real answer is:5
the classifier came back with:5, the real answer is:5
the classifier came back with:5, the real answer is:5
the classifier came back with:5, the real answer is:5
the classifier came back with:5, the real answer is:5
the classifier came back with:5, the real answer is:5
the classifier came back with:5, the real answer is:5
the classifier came back with:5, the real answer is:5
the classifier came back with:5, the real answer is:5
the classifier came back with:5, the real answer is:5
the classifier came back with:5, the real answer is:5
the classifier came back with:5, the real answer is:5
the classifier came back with:5, the real answer is:5
the classifier came back with:5, the real answer is:5
the classifier came back with:5, the real answer is:5
the classifier came back wit

the classifier came back with:6, the real answer is:6
the classifier came back with:6, the real answer is:6
the classifier came back with:6, the real answer is:6
the classifier came back with:6, the real answer is:6
the classifier came back with:6, the real answer is:6
the classifier came back with:6, the real answer is:6
the classifier came back with:6, the real answer is:6
the classifier came back with:6, the real answer is:6
the classifier came back with:6, the real answer is:6
the classifier came back with:6, the real answer is:6
the classifier came back with:6, the real answer is:6
the classifier came back with:6, the real answer is:6
the classifier came back with:6, the real answer is:6
the classifier came back with:6, the real answer is:6
the classifier came back with:6, the real answer is:6
the classifier came back with:6, the real answer is:6
the classifier came back with:6, the real answer is:6
the classifier came back with:6, the real answer is:6
the classifier came back wit

the classifier came back with:8, the real answer is:8
the classifier came back with:8, the real answer is:8
the classifier came back with:8, the real answer is:8
the classifier came back with:8, the real answer is:8
the classifier came back with:1, the real answer is:8
the classifier came back with:8, the real answer is:8
the classifier came back with:8, the real answer is:8
the classifier came back with:8, the real answer is:8
the classifier came back with:8, the real answer is:8
the classifier came back with:8, the real answer is:8
the classifier came back with:8, the real answer is:8
the classifier came back with:8, the real answer is:8
the classifier came back with:8, the real answer is:8
the classifier came back with:8, the real answer is:8
the classifier came back with:1, the real answer is:8
the classifier came back with:8, the real answer is:8
the classifier came back with:8, the real answer is:8
the classifier came back with:8, the real answer is:8
the classifier came back wit

### 总结：
>* KNN是实现手写数字识别， 精确度还是挺高的， 上面的重点主要是怎么处理图像数据，也就是imgtoVector的技巧，我下面又给出了一种简便方式，就是从吴恩达深度学习里面学习的技巧，用reshape和-1的组合，然后转置，直接就变成向量形式，在下面调用sklearn编程中，我尝试用这种方法。
> 
>

>* 第二个需要学习的就是文件处理， 数据集是分布在很多个文件的，所以有时候需要os模块里面的函数与split结合，进行适当的切分获取
标签和特征，这个方法也是比较好的， 得学会使用，下面的sklearn编程中，依然是使用这个方法获得训练集和测试集
>
>

>* 下面的sklearn方法中，我会使用KNN， 或者是尝试其他的一些分类或者集成方法，看看效果会不会提高一些


In [9]:
# 看下面的例子， 把矩阵转成向量的两种形式
b = np.zeros((1, 9))
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
for i in range(3):
    for j in range(3):
        b[0, 3*i+j] = a[i][j]
print(b)

# 第二种形式
c = a.reshape(9, -1).T
print(c)

[[1. 2. 3. 4. 5. 6. 7. 8. 9.]]
[[1 2 3 4 5 6 7 8 9]]
