Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discuss] Enable user access to detailed routing information #105

Closed
Vigilans opened this issue Aug 21, 2020 · 21 comments
Closed

[Discuss] Enable user access to detailed routing information #105

Vigilans opened this issue Aug 21, 2020 · 21 comments
Labels

Comments

@Vigilans
Copy link
Contributor

What version of V2Ray are you using?

4.27.0

What's your scenario of using V2Ray?

Build a custom geosite/geoip file.

What did you see?

The demand is brought up in following issues (and more):

v2ray/discussion/issues/770

减少dns查询的话,我的方法是每次启动前都会读取上一次运行的acce.log,其中有这样

2020/07/04 04:18:17 tcp:127.0.0.1:41298 accepted tcp:api.vc.bilibili.com:443 [drOut] 
2020/07/04 04:18:43 tcp:127.0.0.1:41313 accepted tcp:google.com:443 [giaOut] 

通过自动提取giaOut行中的域名字段,合并到之前的域名文件中,重新生成dat文件,再启动v2ray。

//结合一下这个路由
{
    "type": "field",
    "outboundTag": "giaOut",
    "domain": [
        "ext:proxy.dat:giaOut"
    ]
}

这样之前代理过的网站就不会再查询dns了呢。相当个人定制个域名库

v2ray/discussion/issues/592

"log": {
"loglevel": "debug",
"access": "access.log"
},

cat access.log |grep proxy | awk '{print $5}'
sort uniq....

v2ray/discussion/issues/804

希望通过v2ray透明代理(v2ray在网关上)来看美国hbo/netfilx等美剧(有AirPlay/> chromcast之类的所以只能做全局代理)

v2ray在网关上配合iptables,使用了
"geosite:hbo",
"ext:h2y.dat:hbo",
等的数据库来决定走不走代理,但是感觉不够全,经常会出现错误,而如果在手机上用SR全部数据走代理的时候则不会出错。
请问有没有额外的数据库,或者简易点的爬相关视频网站ip的工具?
谢谢!

In a word, there are demands and practices to build a custom geosite/geoip file according to the routing information of V2Ray.

In referenced issues, outbound tag is used to classify domestic and oversea sites and to augment a custom geosite file when not included in standard ones.With routing context, there's much more that could also be done:

  • With trasparent proxy, we may track the source IP of devices like Nitendo Switch, so as to retrieve actual domains it uses and IPs of game servers.
  • With tools like proxychains, we can detour network of a specific application to a specific V2Ray inbound, then by tracking the inbound tag we are able to retrieve all the related domains and ips.

... and many of these information could be finally contributed to v2fly/domain-list-community.

What's your expectation?

Currently, usage of routing information goes around access.log, which provides following information:

  • Source IP
  • Srouce Port
  • Network
  • Target IP / Target Domain
  • Target Port
  • Outbound Tag

But there are some problems with access log:

  • Fields like inbound tag are absent.
  • Balancers not tracked, only the final outbound is reported.
  • Sniffed domains won't be used in log. Only proxy requests that directly send domain to inbound (e.g. http and socks) can get it shown in log. So transparent proxy with dokodemo-door inbound will result in an access log without any domain:
    2020/08/19 00:20:17 192.168.200.7:55724 accepted tcp:106.75.15.133:443 [direct] 
    2020/08/19 00:20:17 192.168.200.7:55423 accepted udp:192.168.200.1:53 [dns-out] 
    2020/08/19 00:20:18 192.168.200.7:55386 accepted tcp:149.154.175.100:443 [proxy] 
    
  • Usage of access log usually involves reading of on-disk log file (in stdout access and error log often get messed up), which poses potential privacy problem.

Considering above arguments, I expect to provide users a proper way of access to complete routing context, as well as keeping such access to restricted usage.

@Vigilans
Copy link
Contributor Author

Provide routing stats as subscribable service

Due to 1) inherent privacy problem; 2) inconvenience of parsing plain-text to in-memory data structure, attaching more information to access log may not be a good idea. If a custom client could directly get in-memory context object and process it using an advanced language (e.g. Go), it would be much more powerful.

So personally, I propose using stats manager and api service to provide routing information as a stream. That means, with a new stats config:

"stats": {
    "routing": { "enabled": true }
}

A new gRPC stream service then could be registered:

service StatsService {
  rpc SubscribeRoutingStats(RoutingStatsRequest) returns (stream RoutingStatsResponse) {}
}

message RoutingStatsRequest {
  repeated string fields = 1;
}

message RoutingStatsResponse {
  string InboundTag = 1;
  v2ray.core.common.net.Network Network = 2;
  repeated bytes SourceIPs = 3;
  repeated bytes TargetIPs = 4;
  uint32 SourcePort = 5;
  uint32 TargetPort = 6;
  string TargetDomain = 7;
  string Protocol = 8;
  string UserEmail = 9;
  repeated AttributeStat Attributes = 10;
  repeated string DetourTags = 11;
  string OutboundTag = 12;
}

Now a custom client can subscribe to this stream channel, continuously receving routing contexts and process them at its will.

To reduce transport burden, client could manually specify the fields it wants to retrieve:

"fields": ["inbound", "ip", "port", "outbound"]

Benefit

  • More complicated logics could be applied on routing context to dispatch domain/ip to a target category, thus building a delicately managed geosite/geoip file.
  • No interference with disk like access log, data communication is privately in-memory.
  • V2Ray's GUI distribution could utilize this service to provide a more powerful statistics service. It could act as client to subscribe to stream channel, receving routing entries and visualize it in a good way. Some filtering, grouping could be done here, along with the ability to process the data to generate a geosite/geoip file, making custom geofile more available to end users.

Privacy problem

As @klzgrad stated in Fake DNS's discussion (v2ray/v2ray-core#2233), being able to record browsing history and keep it is a problematic behavior. Providing such records through stats service is powerful yet somewhat risky and should be carefully regulated.

To regulate the usage of routing stats service, I propose:

  • The stats service do not keep the data. The routing context is sent from router and received straight by client, neither will it be kept on disk nor memory. How to handle the context data is then up to client's implementation.
  • When routing stats service is enabled, at most 1 client can subscribe to the gRPC stream channel. If one decides to enable routing stats, it should take responsibility to make sure its own service successfully subscribed, so routing stats will not be managed by third-party applications.

With above two measures stats service may be a more secure way to retrieve routing information compared to access log.

I would like to receive some ideas on these proposals and other measures (like authentication?) that may potentially help improve the security of such service, or any idea that is against this feature for a rational reason.

@Vigilans
Copy link
Contributor Author

Vigilans commented Aug 21, 2020

@Robot-DaneelOlivaw
Copy link

Robot-DaneelOlivaw commented Aug 21, 2020

Just finished reading the context. I was going to propose a modification to access log when we talked about DNS days ago yet hesitated. But your proposal is much more refine. As far as I'm concerned, it could be a great help with comprehending network traffic.

My questions are mainly on user side:

  • Will access log still be available? If so, privacy risk hasn't been reduced. If not, developers of custom clients should be informed or given time to update accordingly, before a new version of core rolls out.
  • Is core going to provide a built-in geosite/geoip generator, as your request is to build custom lists?

P.S.

  • Is it viable to feed core a domain/IP, run through DNS and routing process, get the result of outbound tag/DNS respond, without truly making the request?

@IceCodeNew
Copy link
Contributor

I must say that the idea of expelling routing stats by a stream-service rather than a disk log is formidable.
I endorse the worry about privacy problems and the envisaged framework for restricting access to routing stats. In fact, among the whole blueprint, nothing is more attractive to me than the concept of utilizing the data for contributing to the v2fly/domain-list-community project.

But I still have a concern, how would this step compromise the traditional way of V2ray users to debug their configuration? IMO, the access.log is not something unconsidered. The role we except it play is the same as the access.log to NGINX.
This proposal seems to leave the question about "Whether to log or not?" and "How to log?" to third-party clients. And it could lead to hundreds of different variations of logging files, make it difficult for anyone trying to troubleshoot configuration problems. Also, I would regard it as a failure if v2ray-core gives away all the control of logging to third-party clients.
I think it is better to refine current logging implementation in v2ray-core rather than partially remove its function. In this way, we could keep control of a unified format of logging. And it may also come with the ability to rotate logs that are several days aged, so as to ensure the access.log would be the TEMPORARY access Logs.

@Vigilans
Copy link
Contributor Author

Vigilans commented Aug 21, 2020

@Robot-DaneelOlivaw

Will access log still be available?

Personally I think it should be left untouched, for following reasons:

  • Access log is a de facto standard in V2Ray. Project's issue template requires it, and many users and some scripts rely on it. It should be kept stable for a long time, unless when possibly the stats service way is widely accepted and becomes a new standard.
  • Since V2Ray 4.20, log.access could be set to "none" to close access log. So it can be written in V2Fly guide that users could close the access log if they opt for the routing stats service to retrieve access info.

Is core going to provide a built-in geosite/geoip generator, as your request is to build custom lists?

I am personally using a modified version of gamesofts/v2ray-custom-geo, which imports v2ray and protobuf package to build custom list. Currently I cannot come up with some good ideas on how to bake geofile generation into core.

Since the demand of geofile generation varies and seems not formalized into some well-accepted standard (as far as I know), it may be appropriate to leave it to community for now, until when a consensus on what interface the core should provide is reached. Or do you already have some good proposals ready?

Is it viable to feed core a domain/IP, run through DNS and routing process, get the result of outbound tag/DNS respond, without truly making the request?

This demand is accomplishable considering what I've implemented for now: an interface for routing context.

type Context interface {
	GetInboundTag() string
	GetSourceIPs() []net.IP
	GetSourcePort() net.Port
	GetTargetIPs() []net.IP
	GetTargetPort() net.Port
	GetTargetDomain() string
	GetNetwork() net.Network
	GetProtocol() string
	GetUser() *protocol.MemoryUser
	GetAttributes() map[string]interface{}
}

So one could manually setup a Context object and pass it to router to get the result. This feature could be implemented independently, whether stats service will be adopted or not.

@kslr
Copy link
Contributor

kslr commented Aug 21, 2020

I think the log can be modified. There will be many destructive changes in the next 4.28.

For "airports", they use a specific modified version

@ToutyRater
Copy link
Contributor

我觉得 @Vigilans 的想法很好,我目前在使用透明代理的方式,确实遇到一些使用 geosite 无法满足需求的场景,客制化 geosite 是比较好的。我早先有做过一项工作,写了个脚本处理 error.log 内容,再将之追加至 dns 和 routing 配置中,可能是脚本写得不好,会出现一些期望之外的情况。我倾向于通过 gRPC 获取详细的路由信息。
@IceCodeNew 的想法也没有问题,但我认为客制化 geosite 与 domain-list-community 并不冲突。

@IceCodeNew
Copy link
Contributor

但我认为客制化 geosite 与 domain-list-community 并不冲突。

我感觉你好像误会了我的意思……

@Vigilans
Copy link
Contributor Author

Vigilans commented Aug 21, 2020

@IceCodeNew

I'm all for your concern about the compromising of access log, where I also wrote my last comment before yours got published to reckon that access.log is a de facto standard and should be respected.

More specifically, the usage of routing stats service should be divided into two parts and discussed separately:

Logging

The usage I mentioned:

V2Ray's GUI distribution could utilize this service to provide a more powerful statistics service.

is an actual threat to the circumstances of access.log. Once routing stats service is adopted for logging, it may break the ecoenvironment of unified access log format, and this is irreversible since depedencies grow continuously.

Geofile generation

The usage of routing context in logging and geofile generation differs in such ways:

  • Logging does not require the info as detailed as possible. Current access.log's information is enough for debugging to some extent, except for that inbound tag and sniffed domain should be used.
  • Due to various demands, the routing context passed to geofile generation should be as comprehensive as possible.

So, regarding the option of retrieving routing information from access log only, some issues should be pointed out and discussed:

One is during serialization: complex structures like attributes could not be easily written into log message. Also, after writing many fields to a message, readability of log is reduced.

One is during deserialization. This brings up an important question: have you expected the geofile generation to be supported as basic service in most clients, or to be supported in a github repo which will be widely accepted as standard?

If so, then a stable retrievement of routing context should be a concern. Currently, routing-based geofile generation is only done by self-written scripts, personally I think one of the reasons is that format of access log isn't guaranteed stable yet. It is not something like json that could safely deserialize.

This prevents a standard implementation of geofile generation from invention. If v2ray-core were to add a new routing context field to access.log's entry, how could a client guarantee its deserializaion code won't break after upgrading core? In gRPC, compatibily could be kept between versions, so it is suitable for a client to subscribe and provide a stable service.

Arguments above is just for reference to be taken into consideration, not for mere advocation of stream service...

@Loyalsoldier
Copy link
Contributor

Loyalsoldier commented Aug 21, 2020

我觉得最终实现一个像 cow 一样自动判定域名/IP 是否被屏蔽的机制对于把 V2Ray 当作透明代理是比较好的解决方案。但是对于 Netflix 这样没被屏蔽但是必须走代理的情况,还是得人工操作。

(但使用像 cow 一样的机制,必然会导致第一次访问某些域名/IP 会在某种意义上算是冲塔,也许这对于某些人而言是个问题。)

关于日志涉及隐私这个问题我觉得站不住脚,其余没啥意见和建议。

@Robot-DaneelOlivaw
Copy link

In that case, I agree with deploying routing stats service while keeping access.log until community has fully migrated to the new feature. Any chance of affecting performance?

Personally, I haven't tried any of these autonomous geofile generation. But whether to make it official or not, utilizing routing stats service is more elegant than processing access.log.

@kslr
Copy link
Contributor

kslr commented Aug 22, 2020

对于现在,加入到 API 中是一个比较合理的选择。

@IceCodeNew
Copy link
Contributor

The usage of routing context in logging and geofile generation differs in such ways:
Logging does not require the info as detailed as possible.

Can't agree more.


I vote to retrieve routing information during deserialization. And for the question:

have you expected the geofile generation to be supported as basic service in most clients, or to be supported in a github repo which will be widely accepted as standard?

My answer is No. But it is still worth considering a mechanism for keeping deserialization safely and compatibly in advance. gRPC seems to be a good option.

@Vigilans
Copy link
Contributor Author

Vigilans commented Sep 21, 2020

@kslr @Robot-DaneelOlivaw

Currently, two APIs related to routing are proposed:

  • Subscribe routing stats
  • Test routing stats

Regarding how to expose these services to user, two styles of config are proposed:

Reuse StatsService

JSON config:

"stats": {
  "routing": {
    "enabled": true
  }
},
"api": {
  "services": ["StatsService"]
},

All the service code is implemented in app/stats/command.

Similarly, other stats like DNS would also reuse StatsService.

Use new RoutingService

JSON config:

"stats": {},
"api": {
  "services": ["RoutingService"] // Or "RouterService"?
},

All the service code is implemented in a new package app/router/command.

Similarly, other stats like DNS, health check may also create their own service.

Which one do you prefer?

@Loyalsoldier
Copy link
Contributor

I think it's better to use it as a new Service.

@Robot-DaneelOlivaw
Copy link

I'm not sure if I have a say in this since I've never used statistics information nor understood potential influence between different implementations. Currently, there's no parameter in StatsObject. Reusing StatsService might be easier to understand semantically without leaving "stats": empty.

But I'll agree with anyone who has a different thought. And as always, appreciate your work.

@Vigilans
Copy link
Contributor Author

Considering reusing StatsService, there're some perspectives to be discussed.

Code Organization

For now, there are two routing APIs to be put in StatsService:

  • Subscribe to Routing Stats
  • Manual Route Test

If we were to support new APIs for DNS, there are also two I could come up:

  • Subscribe to DNS Lookup Records
  • Manual DNS Lookup Order Test.

Grouping all these APIs altogether results in StatsService's inflation. Apart from inflation, Manual Route Test and Manual DNS Lookup Order Test cannot be really regarded as service about statistics.

If one could accept Manual Route Test as to get routing stats manually so it is a stats API, then imagine providing APIs for an outbound health check app, we may have:

  • Subscribe to Outbound Health Check Result
  • Manually feed a URL to health check an outbound

The latter API then cannot be recognized as stats API from my perspective.

@Vigilans
Copy link
Contributor Author

Vigilans commented Sep 21, 2020

Code Implementation

It should be noted that "stats" and "services": ["StatsService"] correspond to different modules in the core:

  • "stats": app/stats/stats.go: Manager
  • "StatsService": app/stats/command/command: statsServer

And from command.statsServer it cannot retrieve the config of stats.Manager. By setting

"stats": {
  "routing": {
    "enabled": true
  }
},

, stats.Manager is controlling the registration of routing stats channel to indirectly control whether statsService supports subscribing routing stats. This does not apply to Manual Route Test. There is no way for stats.Manger to prevent statsService from requiring Router feature. So once "StatsService" is written in JSON config, along with other stats API, Manual Route Test will also be open to user by default and cannot be closed.

By adopting "RoutingService", API division is more fine-grained, user can use this service to control whether to open API for manual route test.

@Vigilans
Copy link
Contributor Author

Vigilans commented Sep 21, 2020

@Loyalsoldier @Robot-DaneelOlivaw

After writing above discussion, I come up with a new proposal to utilize both of your ideas.

To achieve finer granularity, we may use the new "RoutingService", but also keep the semantic of stats config:

"stats": {
  "routing": {
    "enabled": true
    // ...some other configs of channel for future use
  }
},
"api": {
  "services": ["RoutingService"]
},

Here, two routing APIs belongs to "RoutingService", but in order to subscribe to routing stats, user need to explicitly register the channel in "stats" config, so v2ray-core will allocate resources for listening to routing statistics. Other services like DNS could also follow this pattern, by grouping them in "stats" config user can make sure v2ray-core will not take extra overhead in recording these statistics if user did not explicitly set them.

@github-actions
Copy link
Contributor

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 5 days

@github-actions
Copy link
Contributor

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 5 days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants