* 论文名：Classification of functional data: A segmentation approach
* 作者：Bin Li ∗, Qingzhao Yu
* Louisiana State University, 70803 Baton Rouge, LA, United States

# 论文概况

主要方法：
* FSDA(functional segment discriminant analtsis) 函数段判别分析——主要是将LDA和SVM相结合。

优点：
* 对不规则的功能数据特别有用(spatial heterogeneity"空间异构性" and spikes)
* fraction of the spectrum 减少计算量
* 可以识别重要的预测值和提取特征
* 灵活——易于合并来自其他数据源的信息或来自调查人员的先验知识

## 方法简介

**LDA**（Linear discriminant analysis）：


* 对非正态分布以及不同类别的协方差相当**稳健**


* 对高度非线性的类边界过于严格
* 对FDA中常出现的许多高度相关的预测因子这种情况，他太灵活了

**SVM**：

* 很容易扩展到非线性空间


* 由于一视同仁的利用所有变量，SVM存在冗余变量

**FSDA**（本文的方法）

* LDA作数据简化、降维(data reduction), SVM作分类器
* 对不规则的函数型数据和没有明显局部特征的数据都有很好的分类效果

# 代码实现
## 导入数据

$\color{red}{不同发声者是否需要分开}$

In [87]:
pd.set_option('display.max_rows',5)

In [50]:
import numpy as np
import pandas as pd
df = pd.read_csv('F:\\Data and code\\data\\FSDA\\phoneme.data',index_col = 0)
# df = df.iloc[:,:-1]

In [57]:
df.loc[:,'speaker'][1000]

'train.dr3.mgaf0.sa1'

In [76]:
df

Unnamed: 0_level_0,x.1,x.2,x.3,x.4,x.5,x.6,x.7,x.8,x.9,x.10,...,x.249,x.250,x.251,x.252,x.253,x.254,x.255,x.256,g,speaker
row.names,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,9.85770,9.20711,9.81689,9.01692,9.05675,8.92518,11.28308,11.52980,10.79713,9.04747,...,12.68076,11.20767,13.69394,13.72055,12.16628,12.92489,12.51195,9.75527,sh,train.dr1.mcpm0.sa1
2,13.23079,14.19189,15.34428,18.11737,19.53875,18.32726,17.34169,17.16861,19.63557,20.15212,...,8.45714,8.77266,9.59717,8.45336,7.57730,5.38504,9.43063,8.59328,iy,train.dr1.mcpm0.sa1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4508,8.39388,9.84770,16.24534,17.35311,14.80537,12.72429,17.01145,17.54733,14.35809,13.65718,...,4.57875,7.91262,8.08014,9.25111,9.56086,9.37979,6.83916,8.54817,ao,test.dr8.mslb0.sa1
4509,8.14032,9.93753,16.30187,17.31425,14.40116,13.52353,16.85938,17.14016,13.06426,15.32220,...,5.27574,6.95050,7.83462,7.96455,7.26886,7.08945,7.72929,6.42167,ao,test.dr8.mslb0.sa1


* speaker共有437个取值分类，太多了，应该没有分，$\color{red}{先试试没有分的情况}$【50个说话者的每个音素大约有2个示例】
* test.dr1.mdab0.sa1——第3341行【索引是3340】是第一个测试集行
* 3340行df.iloc[:3340]训练集
* 1169行df.iloc[3340:]测试集

In [90]:
df['speaker'].value_counts()

train.dr5.mjrg0.sa1    12
train.dr5.mwac0.sa1    12
                       ..
train.dr6.mtxs0.sa1     7
train.dr2.msat0.sa1     7
Name: speaker, Length: 437, dtype: int64

In [88]:
df.iloc[3339,:]

x.1                   12.50446
x.2                   13.94449
                  ...         
g                           ao
speaker    train.dr8.mtcs0.sa1
Name: 3340, Length: 258, dtype: object

In [89]:
df.iloc[3340,:]

x.1                   9.92796
x.2                   9.74454
                  ...        
g                          sh
speaker    test.dr1.mdab0.sa1
Name: 3341, Length: 258, dtype: object

In [77]:
4509 - 3340

1169

In [78]:
df.iloc[:3340]

Unnamed: 0_level_0,x.1,x.2,x.3,x.4,x.5,x.6,x.7,x.8,x.9,x.10,...,x.249,x.250,x.251,x.252,x.253,x.254,x.255,x.256,g,speaker
row.names,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,9.85770,9.20711,9.81689,9.01692,9.05675,8.92518,11.28308,11.52980,10.79713,9.04747,...,12.68076,11.20767,13.69394,13.72055,12.16628,12.92489,12.51195,9.75527,sh,train.dr1.mcpm0.sa1
2,13.23079,14.19189,15.34428,18.11737,19.53875,18.32726,17.34169,17.16861,19.63557,20.15212,...,8.45714,8.77266,9.59717,8.45336,7.57730,5.38504,9.43063,8.59328,iy,train.dr1.mcpm0.sa1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3339,12.08473,13.30547,16.92965,17.14066,13.09360,17.26574,19.11256,17.70642,14.61316,18.45107,...,3.52400,5.13305,6.11436,6.73741,4.14095,5.68638,6.17875,7.57668,ao,train.dr8.mtcs0.sa1
3340,12.50446,13.94449,16.18002,16.87066,11.25393,16.60865,17.89792,16.89914,12.85608,18.02536,...,7.10767,7.20332,7.26288,6.06767,7.92481,7.41997,4.88609,4.94594,ao,train.dr8.mtcs0.sa1


In [79]:
df.iloc[3340:]

Unnamed: 0_level_0,x.1,x.2,x.3,x.4,x.5,x.6,x.7,x.8,x.9,x.10,...,x.249,x.250,x.251,x.252,x.253,x.254,x.255,x.256,g,speaker
row.names,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
3341,9.92796,9.74454,8.74148,10.31529,12.60091,13.20503,12.91027,12.83584,10.10106,12.63221,...,14.30416,14.25287,14.21510,14.61223,11.68249,11.97492,10.91413,10.05628,sh,test.dr1.mdab0.sa1
3342,9.94144,11.98050,11.22016,12.35993,18.01395,18.13849,15.14025,12.22858,14.88516,19.31814,...,10.20457,8.82700,8.23437,8.79296,8.53510,8.77193,8.22322,6.99881,iy,test.dr1.mdab0.sa1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4508,8.39388,9.84770,16.24534,17.35311,14.80537,12.72429,17.01145,17.54733,14.35809,13.65718,...,4.57875,7.91262,8.08014,9.25111,9.56086,9.37979,6.83916,8.54817,ao,test.dr8.mslb0.sa1
4509,8.14032,9.93753,16.30187,17.31425,14.40116,13.52353,16.85938,17.14016,13.06426,15.32220,...,5.27574,6.95050,7.83462,7.96455,7.26886,7.08945,7.72929,6.42167,ao,test.dr8.mslb0.sa1
