Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2ray的TLS流量可被简单特征码匹配精准识别(附PoC) #704

Closed
p4gefau1t opened this issue May 30, 2020 · 121 comments
Closed

v2ray的TLS流量可被简单特征码匹配精准识别(附PoC) #704

p4gefau1t opened this issue May 30, 2020 · 121 comments
Labels

Comments

@p4gefau1t
Copy link

@p4gefau1t p4gefau1t commented May 30, 2020

这个issue应该不能算bug report,但是也没有找到合适的模板,所以没有使用模板,抱歉。

先说结论:仅凭tls client hello的cipher suite字段,就可以非常准确地将v2ray流量和正常浏览器流量区分开来。

PoC(来自@DuckSoft),此iptables规则可封禁所有v2ray的allowInsecureCiphers设置为false(默认设置)的出站TLS流量,而其他TLS流量不受影响:

iptables -I OUTPUT -m string --algo kmp --hex-string "|001ecca8cca9c02fc02bc030c02cc027c013c023c009c014c00a130113031302|" -j DROP

在新的版本释出之前,个人建议的缓解措施是客户端将allowInsecureCiphers设置为true。服务端建议可以拒绝所有具有此特征的TCP连接,以强迫所有客户端更新版本(但此举可能导致服务器遭到主动探测)。


下面是发现的过程和分析。

实验的启发来自下面这篇文章,其中提到了使用机器学习训练的模型可对v2ray的tls+ws流量进行识别,准确率高达0.9999

https://fr33land.net/2020/03/12/can-enable-tls-in-v2ray-help/

他训练的模型已经开源,仓库如下

https://github.com/rickyzhang82/V2Ray-Deep-Packet-Inspection

经过本地测试,可以复现。并且不限于tls+ws,对tls+vmess等组合也同样有效。其他tls流量如浏览器流量等,全程没有出现误报情况。

因此初步怀疑是v2ray使用的utls进行client hello伪造出现的问题。

https://github.com/v2ray/v2ray-core/blob/edb4fed387d27890902e7ee97aae0d97292f912b/transport/internet/tls/config.go#L176-L230

此处使用的cipher suite,可能出于安全目的,使用了一组特殊的组合,而与绝大多数浏览器不同 。为了对比,下面是utls模拟的chrome的client hello。

https://github.com/refraction-networking/utls/blob/43c36d3c1f57546d5cbb05c066df7b5a78686c51/u_parrots.go#L141-L214

抓包可以发现,对比真实的chrome与utls的client hello,两者基本一致,但与v2ray的存在较大差别,其中包括suite和extension的差别。此后,我们将utls的chrome的cipher suite patch到v2ray中后,此模型无法识别v2ray的tls流量

所以我们可以初步认为,模型很可能是学习了tls client hello的特征,导致流量被识别。

但实际上,识别tls client hello并不需要使用机器学习的方法,简单的DPI即可实现,因此在gfw部署的成本很低。并且,由于这组cipher suites太过特殊,我们可以仅凭cipher suites进行准确识别。

顺带一提,cipher suites列表在代码中的顺序,和实际的发送的client hello中的顺序似乎是相反的,不知这是有意为之还是bug。

个人建议,客户端依旧使用utls,但应该伪造chrome/firefox的浏览器client hello,AllowInsecureCiphers仅对服务器生效,由服务器限制不安全的cipher suites。

@DuckSoft
Copy link

@DuckSoft DuckSoft commented May 30, 2020

因为 Cipher Suite 特征过于明显,随手撸一个从 0x90 偏移量开始的 memcmp 都能精准高效识别:
图片

“特征码”:cca8cca9c02fc02bc030c02cc027c013c023c009c014c00a130113031302

甚至可以利用 iptables 进行明文匹配……其他的 PoC 方式请大家自己开动脑筋……

代码参考:

iptables -I OUTPUT -m string --algo kmp --hex-string "|001ecca8cca9c02fc02bc030c02cc027c013c023c009c014c00a130113031302|" -j DROP

Upvote, and similar to v2ray/v2ray-core#1660, v2ray/v2ray-core#2098

@ghost
Copy link

@ghost ghost commented May 30, 2020

Confirmed with V2ray 4.23 on Archlinux:

image

image

EVEN The server is not being configured to use TLS.
The issue occurs on the INITIAL TLS handshake packet. Which means the buggy Client Hello Message will always be sent if AllowInsecureCiphers is set to false, which is the default value

I suggest this issue go top priority.

@p4gefau1t p4gefau1t changed the title 关于v2ray的TLS流量可被机器学习识别的PoC复现,以及原理讨论 v2ray的TLS流量可被简单特征码匹配精准识别(附PoC) May 30, 2020
@proletarius101
Copy link

@proletarius101 proletarius101 commented May 30, 2020

This issue has been addressed by Tor and Naiveproxy. It's great that someone realize this undermined problem in v2ray. However, it's not that easy to completely remove the attack surface unless we integrate related components of a popular browsers and keep it up-to-date, like what Tor and Naiveproxy do. (Because if we take a closer look, the connection behavior could be a strong fingerprint).

@proletarius101
Copy link

@proletarius101 proletarius101 commented May 30, 2020

Btw, generally speaking, it's simply a fingerprint of most go programs. So it depends on what we want mock. Apparently go programs have different connection behavior than Firefox, for example.

@ghost
Copy link

@ghost ghost commented May 30, 2020

IMO, use TLS library's default setting would resolve this problem: We're same as other Go program in this condition.

Btw, generally speaking, it's simply a fingerprint of most go programs. So it depends on what we want mock. Apparently go programs have different connection behavior from Firefox, for example.

@proletarius101
Copy link

@proletarius101 proletarius101 commented May 30, 2020

IMO, use TLS library's default setting would resolve this problem: We're same as other Go program in this condition.

Agreed

@hanazaki05
Copy link

@hanazaki05 hanazaki05 commented May 30, 2020

There is an utls enabled version for fingerprint issue, you need to inspect the codes by yourself to ensure it's safe.
https://github.com/emc2314/v2ray-core

@vcptr
Copy link

@vcptr vcptr commented May 30, 2020

you don't need machine learning to find something unique in plain words. TLS protocol itself is plaintext in the network.

Different browser/version has their own unique fingerprints. There's a project collecting TLS fingerprints and do the statistic works. see https://tlsfingerprint.io/

@p4gefau1t
Copy link
Author

@p4gefau1t p4gefau1t commented May 30, 2020

IMO, use TLS library's default setting would resolve this problem: We're same as other Go program in this condition.

Btw, generally speaking, it's simply a fingerprint of most go programs. So it depends on what we want mock. Apparently go programs have different connection behavior from Firefox, for example.

Agreed. V2ray misused utls library, and blocked some "insecure" ciphers, and make it easy to detect.

The easiest way to solve this is to use net/tls default settings. But I think the best way to fix this is to use utls anti-fingerprinting features correctly.

https://github.com/refraction-networking/utls

@darhwa
Copy link

@darhwa darhwa commented May 30, 2020

我倒是好奇当初是基于何种考虑,要增加这么一个默认的CipherSuites列表呢?

如果客户端与服务器都处于自己控制之下, 不管CipherSuites改不改,协商出来的只会是TLS 1.3的那三件套。如果服务器本身不安全,那改这么个CipherSuites列表又能起到什么鸟用?

我怀疑当初加进这个的人,根本就没弄清楚TLS握手的过程。 v2ray/v2ray-core#2477 就是个最近的例子。

另外我再补充一点,golang默认设置里面,ClientSessionCache也是没有的,建议一并拿掉。光把那个CipherSuites列表拿掉,还是会比一般的golang客户端的ClientHello多一个session_ticket的extension。

补充第二点,目前的alpn设置也非常奇葩。好几个地方设置成单有h2。试问有哪些主流应用会在客户端设置单有h2的alpn?这也可算是一个显著特征。

@KevinZonda
Copy link

@KevinZonda KevinZonda commented May 30, 2020

@klzgrad you may be interested

@mnihyc
Copy link

@mnihyc mnihyc commented May 30, 2020

图片
v2ws + tls(apache2) 复现成功,本来还以为套 apache2 没事,没想到直接传递了......

@DuckSoft
Copy link

@DuckSoft DuckSoft commented May 30, 2020

图片
v2ws + tls(apache2) 复现成功,本来还以为套 apache2 没事,没想到直接传递了......

所以说,这次的问题是客户端当了猪队友,发了一个特征极强的 Client Hello 给服务器

@mnihyc
Copy link

@mnihyc mnihyc commented May 30, 2020

图片
v2ws + tls(apache2) 复现成功,本来还以为套 apache2 没事,没想到直接传递了......

所以说,这次的问题是客户端当了猪队友,发了一个特征极强的 Client Hello 给服务器

确实,可以考虑把 v2 换掉了,TLS 的替换方案也不少(指比较冷门的)

@kotori2
Copy link

@kotori2 kotori2 commented May 30, 2020

@studentmain But since very few apps using Go, it is still kind of easy to detect from it's TLS fingerprint.

@proletarius101
Copy link

@proletarius101 proletarius101 commented May 30, 2020

确实,可以考虑把 v2 换掉了,TLS 的替换方案也不少(指比较冷门的)

This defect also applies to Trojan which openssl + customized client TLS configuration.

Traditional Https proxy is also potentially detectable because of small handshake packages in TLS.

@StarryVoid
Copy link

@StarryVoid StarryVoid commented May 31, 2020

用 V2rayN v3.18 + V2ray-core v4.23.1 测试
使用 TLS 的多种配置中,仅发现 vmess+h2+tls 默认没有 Client Hello
服务端抓包命令 tcpdump -ni eth0 "tcp port 443 and (tcp[((tcp[12] & 0xf0) >> 2)] = 0x16)"

@StarryVoid
Copy link

@StarryVoid StarryVoid commented May 31, 2020

另附一份 Cloudflare 的 TLS Client Hello cipher suites 对照表
https://raw.githubusercontent.com/cloudflare/mitmengine/master/reference_fingerprints/mitmengine/browser.txt

@ghost
Copy link

@ghost ghost commented May 31, 2020

目前的问题不在于使用 Go 或者是其它语言/库来实现 TLS

V2ray-Core 当前的问题是: 在不开 AllowInsecureCiphers 时,硬编码了几个安全的 TLS 加密套件导致 Client Hello 中的 Ciphers 成为 V2ray 的流量指纹

问题的本身不在于 TLS 握手包特征,而是 V2ray 错误地使用了 TLS 库

@fdmove
Copy link

@fdmove fdmove commented May 31, 2020

顺带一提,cipher suites列表在代码中的顺序,和实际的发送的client hello中的顺序似乎是相反的,不知这是有意为之还是bug。
-->
这个应该是为了优先使用TLS1.3

@xiaokangwang
Copy link

@xiaokangwang xiaokangwang commented May 31, 2020

现在的情况是V2用的TLS的库是Go语言的TLS库,所以肯定在行为上和OpenSSL有一定区别。现在可以做的就是首先先改成使用Go语言默认的加密套件。
长期的来看,如果想解决这个问题比较容易的方法还是在客户端本地用使用了OpenSSL库的程序转发一下这个流量。比如 ncat -l localhost 1234 --sh-exec "ncat --ssl v2ray.example.com 1234"

@GoldJohnKing
Copy link

@GoldJohnKing GoldJohnKing commented May 31, 2020

私以为使用Go的默认TLS库已经足够,毕竟考虑到Go的默认TLS库的使用范围足够广,理应不会直接将所有Go程序一并干掉……所以只要和其他广泛使用的Go程序的TLS特征一致或相似,就足够了。不过本地整个OpenSSL当然是更好的。

@maidmeow4
Copy link

@maidmeow4 maidmeow4 commented May 31, 2020

reproduced
image

@rickyzhang82
Copy link

@rickyzhang82 rickyzhang82 commented May 31, 2020

@p4gefau1t @DuckSoft

Great jobs, guys! I'm not an expert on TLS. I really appreciate your finding this!

  • IMO, V2ray uses the stock version TLS implementation from Golang. It doesn't use utls.
  • The V2Ray client side needs to blend in its TLS traffic as whatever popular web browser client in mainland China. In this way, people have a better chance to circumvent GFW.
  • The V2Ray server side needs to defend active probe from CCP, which is reported by shadowshock researcher in here. The gfw.report is down now. But you can access it from archive.

So far, I see the proposal only address the client side. Any concern on server side that can be probed by CCP due to TLS handshake leaking?

@DuckSoft
Copy link

@DuckSoft DuckSoft commented May 31, 2020

@p4gefau1t @DuckSoft

Great jobs, guys! I'm not an expert on TLS. I really appreciate your finding this!

* IMO, V2ray uses the stock version TLS implementation from Golang. It doesn't use [utls](https://github.com/refraction-networking/utls).

* The V2Ray client side needs to blend in its TLS traffic as whatever popular web browser client in mainland China. In this way, people have a better chance to circumvent GFW.

* The V2Ray server side needs to defend active probe from CCP, which is reported by [shadowshock researcher in here](https://web.archive.org/web/20200416083158/https://gfw.report/). The gfw.report is down now. But you can access it from archive.

So far, I see the proposal only address the client side. Any concern on server side that can be probed by CCP due to TLS handshake leaking?

To be honest, I contributed only to validation and implementation, and the original idea was from @p4gefau1t, so he's the very superb man. Server side may also be probed, but under the shelter of nginx/caddy, the fingerprint will be eliminated. As long as you keep your endpoint address secret, GFW can't discover any difference.

But the cruel thing is: It's not server's fault. So far it seems that, it's your client that stirred trouble and shout at your server: "come on boy, I need to circumvent GFW!" I don't know how can this affect servers, but I can see someone will broadcast this very payload to scan competitors' servers. That shall also be considered.

@lp123sun
Copy link

@lp123sun lp123sun commented May 31, 2020

Websocket + TLS +Nginx路径分流,也受影响吗?

@DuckSoft
Copy link

@DuckSoft DuckSoft commented May 31, 2020

Websocket + TLS +Nginx路径分流,也受影响吗

影响。

@darhwa
Copy link

@darhwa darhwa commented Jun 2, 2020

@icebluey 使用http/1.1 only的确实不少,但那基本都是旧版本的浏览器。golang程序使用http/1.1 only的又有多少呢?更准确地说,启用了TLS 1.3的golang程序还在用http/1.1 only的有多少呢?

@cjwddtc

This comment has been minimized.

@rickyzhang82

This comment has been minimized.

@cjwddtc

This comment has been minimized.

@tty7tyil

This comment has been minimized.

@cjwddtc

This comment has been minimized.

@tty7tyil

This comment has been minimized.

@rickyzhang82
Copy link

@rickyzhang82 rickyzhang82 commented Jun 2, 2020

你说的『不代表所有翻墙的人都「against censorship」』你自己觉得逻辑在线吗?如果它不 against censorship,它为什么不满足于 GFW 的审查而要翻墙获取外面的信息呢?

Thank you! Finally, I met someone who can reason with logic unlike fifty cents party.

I'm done with discussion here. I can smell some accounts are fishy.

I want to state that the current fix is a sub-optimal solution given the fact that there are extremely small number of Go app as client within and beyond GFW. The commit only replaced the fixed ordered list of hard coded ciphers with a default Golang one. Thus, it blends in V2Ray traffic as Go app.

But please name me some client app written in Go that GFW doesn't want to ban. Docker CLI? Keep name it. There are NONE, no?

We need to investigate utls and see if it can help.

@proletarius101
Copy link

@proletarius101 proletarius101 commented Jun 2, 2020

We need to investigate utls and see if it can help.

Parroting is supposed to work. The leaked info is limited to ClientHello (in this TLS related defect). Parroting it is enough to fix this problem. Further investigation should be taken on fully parroting Chrome or whatever popular browser in China, since the connection behavior including connection termination, error handling, etc.

On the contrary, randomization makes you suspicious, unless it really does blacklisting. The GFW's approach is more like multi-layered whitelisting. It scores the traffic and reacts.

@1265578519

This comment has been minimized.

@ghost
Copy link

@ghost ghost commented Jun 28, 2020

Did anyone mention naiveproxy already?

klzgrad/naiveproxy#94

klzgrad/naiveproxy#94 (comment)

#754

#754 (comment)

@github-actions
Copy link

@github-actions github-actions bot commented Nov 10, 2020

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days

@sparkertim

This comment was marked as off-topic.

@rickyzhang82

This comment was marked as off-topic.

@sparkertim

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests