In [4]:
import pandas as pd
import numpy as np
from scipy.stats import norm

## 解决平均得分问题

平均得分算法：

1. 直接用平均数。问题是，可能分布完全和想象的不一样。
2. 直接用比例。容易造成，评分数很少的商品被推荐。

解决方案：

使用`Wilson Score`。Wilson Score要解决的问题是「给定一个系列评分，有95%的可能性，「真的」正面例子的比例上下界是多少」。计算方法是：

$$\frac{(\hat{p} + \frac{z^2_{\alpha/2}}{2n} \pm z_{\alpha/2} \sqrt {[\hat{p}(1 - \hat{p}) + z^2_{\alpha/2} / 4n] / n)}} {(1 + z^2_{\alpha/2}/n)}$$

ruby伪代码：

```ruby
require 'statistics2'

def ci_lower_bound(pos, n, confidence)
    if n == 0
        return 0
    end
    z = Statistics2.pnormaldist(1-(1-confidence)/2)
    phat = 1.0*pos/n
    (phat + z*z/(2*n) - z * Math.sqrt((phat*(1-phat)+z*z/(4*n))/n))/(1+z*z/n)
end
```

其中$\hat{p}$是正面评价的比例，$z_{\alpha/2}$是高斯分布的$1 - \alpha / 2$的分位数。

In [22]:
## 解决平均得分问题
def ci_bound(pos, n, confidence=0.95):
    if n == 0:
        return 0
    z = norm.ppf(1 - (1 - confidence) / 2)
    phat = pos / n 
    return ((phat + z ** 2 / (2 * n) - z * np.sqrt((phat * (1 - phat) + z ** 2 / (4 * n )) /n )) / \
            (1 + z ** 2 / n), \
           (phat + z ** 2 / (2 * n) + z * np.sqrt((phat * (1 - phat) + z ** 2 / (4 * n )) /n )) / \
            (1 + z ** 2 / n), ) 

## Hacker New的排名机制



```lisp
; Votes divided by the age in hours to the gravityth power.
; Would be interesting to scale gravity in a slider.

(= gravity* 1.8 timebase* 120 front-threshold* 1 
   nourl-factor* .4 lightweight-factor* .3 )

(def frontpage-rank (s (o scorefn realscore) (o gravity gravity*))
  (* (/ (let base (- (scorefn s) 1)
          (if (> base 0) (expt base .8) base))
        (expt (/ (+ (item-age s) timebase*) 60) gravity))
     (if (no (in s!type 'story 'poll))  1
         (blank s!url)                  nourl-factor*
         (lightweight s)                (min lightweight-factor* 
                                             (contro-factor s))
                                        (contro-factor s))))
```

基本可以总结为：

```
Score = (P-1) / (T+2)^G

where,
P = points of an item (and -1 is to negate submitters vote)
T = time since submission (in hours)
G = Gravity, defaults to 1.8 in news.arc
```


## Reddit排名机制

$$f(t_s, y, z) = log_{10}{z} + \frac{yt_s}{45000}$$

其中，$t_s = t_{post} - t_{now}$， $x = Up\_num - Down\_num$, $y = |x|\,if\, x > 1 \, else\, 1$