# 方差分析
消费者与产品生产者、销售者或服务的提供者之间经常发生纠纷。当发生纠纷后,消费者常 常会向消协投诉。为了对几个行业的评价,消费者协会在零售业、旅游业航空公司、家电制造业分别抽取了不同的样本，其中零售业抽取7家，旅游业抽取了6家，航空公司抽取5家、家电制造业抽取了5家，然后记录了一年中消费者对总共23家服务企业投诉的次数，结果如下表。试结合Excel输出的结果分析这四个行业的服务质量是否有显著差异？

In [201]:
import pandas as pd
import numpy as np
import scipy.stats as stats

In [4]:
data = pd.read_excel("data/例10.1.xlsx",header=0)
data

Unnamed: 0,零售业,旅游业,航空公司,家电制造业
0,57,68.0,31.0,44.0
1,66,39.0,49.0,51.0
2,49,29.0,21.0,65.0
3,40,45.0,34.0,77.0
4,34,56.0,40.0,58.0
5,53,51.0,,
6,44,,,


## 单因素的方差分析
### 1. 提出假设
H0 : mu1 = mu2 = mu3 = mu4  
H1 : mu(1,2,3,4) 不全相等  
### 2.构造统计量

In [24]:
x_ = data.mean()
x_

零售业      49.0
旅游业      48.0
航空公司     35.0
家电制造业    59.0
dtype: float64

In [155]:
x__ = data.sum().sum()/len(dataList)
print(x__)
dataList = np.array(data)[pd.notnull(data)]
print(dataList)
length = []
for i in range(0,4):
    length.append(data.iloc[:,i].dropna().shape[0])
length

47.869565217391305
[57. 68. 31. 44. 66. 39. 49. 51. 49. 29. 21. 65. 40. 45. 34. 77. 34. 56.
 40. 58. 53. 51. 44.]


[7, 6, 5, 5]

In [194]:
#总平方和 ∑∑（xij - x__）^2
SST = (((dataList-x__)**2).sum())
SST

4164.608695652174

In [195]:
#组间平方和 ∑nij(x_i-x__)^2
SSA= ((x_-x__)**2*length).sum()
SSA

1456.608695652174

In [197]:
#组内平方和 ∑∑(xij- x_)^2
SSE = ((data.iloc[:,0]-49)**2).sum()+((data.iloc[:,1]-48)**2).sum()+((data.iloc[:,2]-35)**2).sum()+((data.iloc[:,3]-59)**2).sum()
SSE

2708.0

In [198]:
SST == SSA+SSE

True

In [199]:
MSA = SSA / (4 -1)
MSA

485.536231884058

In [200]:
MSE = SSE / (len(dataList)- 4)
MSE

142.52631578947367

In [204]:
F = MSA /MSE
F

3.4066426904716036

## 做出统计决策


In [206]:
F0 = stats.f.isf(0.05,3,23-4)
F0

3.127350005113399

In [212]:
F>F0

True

In [215]:
R2= SSA/SST
R2

0.34975883740838953

拒绝原假设，mu之间有显著差异

## 方差分析中的多重比较

### 1 提出假设
检验1 H0: mu1 = mu2 H1 : mu1 != mu2  
检验2 H0: mu1 = mu3 H1 : mu1 != mu3  
检验3 H0: mu1 = mu4 H1 : mu1 != mu4  
检验4 H0: mu2 = mu3 H1 : mu2 != mu3  
检验5 H0: mu2 = mu4 H1 : mu2 != mu4  
检验6 H0: mu3 = mu4 H1 : mu3 != mu4  

In [219]:
x1 = abs(x_[0] - x_[1])
print(x1)
x2 = abs(x_[0] - x_[2])
print(x2)
x3 = abs(x_[0] - x_[3])
print(x3)
x4 = abs(x_[1] - x_[2])
print(x4)
x5 = abs(x_[1] - x_[3])
print(x5)
x6 = abs(x_[2] - x_[3])
print(x6)

1.0
14.0
10.0
13.0
11.0
24.0


In [223]:
ta= stats.t.isf(0.05/2,23-4)
ta

2.0930240544082634

In [227]:
#LSD = ta/2 (MSE(1/ni+1/nj))**0.5
LSD1 = ta*(MSE*(1/length[0]+1/length[1]))**.5
print(LSD1)
LSD2 = ta*(MSE*(1/length[0]+1/length[2]))**.5
print(LSD2)
LSD3 = ta*(MSE*(1/length[0]+1/length[3]))**.5
print(LSD3)
LSD4 = ta*(MSE*(1/length[1]+1/length[2]))**.5
print(LSD4)
LSD5 = ta*(MSE*(1/length[1]+1/length[3]))**.5
print(LSD5)
LSD6 = ta*(MSE*(1/length[2]+1/length[3]))**.5
print(LSD6)

13.901727781081766
14.63114619914529
14.63114619914529
15.13064578318105
15.13064578318105
15.803444106192725


In [228]:
print(x1<LSD1)
print(x2<LSD2)
print(x3<LSD3)
print(x4<LSD4)
print(x5<LSD5)
print(x6<LSD6)

True
True
True
True
True
False


不能拒接原假设1  
不能拒接原假设2  
不能拒接原假设3  
不能拒接原假设4  
不能拒接原假设5  
拒接原假设6  航空公司和家电制造业被投诉之间有显著差异