Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maxDialConcurrency = 1000 #478

Closed
takakawa opened this issue Nov 22, 2018 · 9 comments
Closed

maxDialConcurrency = 1000 #478

takakawa opened this issue Nov 22, 2018 · 9 comments

Comments

@takakawa
Copy link

Hi, I use fasthttp to serve as a reverse-proxy.I occasionaly found there were many long durtions requests which shouldn't be so long.I read the fasthttp code and made some logs.Now I am sure that maxDialConcurrency (1000) is too small for me .

How does the 1000 come from ?Can I change it ?

@erikdubbelboer
Copy link
Collaborator

Are you sure you have more than 1000 connections trying to dial out at the same time? I'm guessing they are hanging trying to dial unavailable addresses? As a quick solution now you could try setting a lower dial timeout.

The commit that added the 1000 doesn't seem to clarify the reason why this number was chose. I can't think of any reason either. To be honest I don't see a reason to limit this at all. I suggest we remove this feature for now, if we want to add it back later we should but in a different way that makes it user configurable.

@erikdubbelboer
Copy link
Collaborator

@kirillDanshin any thoughts?

@takakawa
Copy link
Author

takakawa commented Nov 23, 2018

The diaI timeout does not seem to cancel the dailing process.It just returns from the go-routine.However mayby I am wrong, maxDialConcurrency is not the problem.

I tried to make maxDialConcurrency=1000000,but the problems is still there.And I add a log
in the tryDial

236     go func() {
237         var dr dialResult
238         t1 := time.Now()
239         dr.conn, dr.err = net.DialTCP(network, nil, addr)
240         t2 := time.Now().Sub(t1)
241         if t2 > time.Second {
242         slog.Info("-----------%v--------",t2)
243         }
244         ch <- dr
245         <-concurrencyCh
246     }()

I got some amazing prints...


 2018-11-23T15:58:01.678987 INFO tcpdialer.go:242: -----------2m7.843203449s--------
 2018-11-23T15:58:02.062974 INFO tcpdialer.go:242: -----------2m7.277283506s--------
 2018-11-23T15:58:02.446981 INFO tcpdialer.go:242: -----------2m7.711398101s--------
 2018-11-23T15:58:02.958981 INFO tcpdialer.go:242: -----------2m7.223396289s--------
 2018-11-23T15:58:03.086976 INFO tcpdialer.go:242: -----------2m7.251240504s--------
 2018-11-23T15:58:03.470975 INFO tcpdialer.go:242: -----------2m7.685300522s--------
 2018-11-23T15:58:03.598977 INFO tcpdialer.go:242: -----------2m8.763251228s--------
 2018-11-23T15:58:05.006976 INFO tcpdialer.go:242: -----------2m8.221296814s--------
 2018-11-23T15:58:05.134975 INFO tcpdialer.go:242: -----------2m8.299212725s--------
 2018-11-23T15:58:05.518979 INFO tcpdialer.go:242: -----------2m7.783395819s--------
 2018-11-23T15:58:05.518980 INFO tcpdialer.go:242: -----------2m8.783394264s--------
 2018-11-23T15:58:06.030979 INFO tcpdialer.go:242: -----------2m8.19521171s--------
 2018-11-23T15:58:06.030980 INFO tcpdialer.go:242: -----------2m7.295394339s--------
 2018-11-23T15:58:06.542986 INFO tcpdialer.go:242: -----------2m8.757301156s--------
 2018-11-23T15:58:06.542988 INFO tcpdialer.go:242: -----------2m7.757270941s--------
 2018-11-23T15:58:06.840256 INFO tcpdialer.go:242: -----------1.034213668s--------
 2018-11-23T15:58:06.852067 INFO tcpdialer.go:242: -----------1.042093656s--------

I am in our production env,the network is quite good.ttl is very low. The dial timeout I set is 50ms.No matter what happens(host achievable,service down,etc),the tcpdialer shouldn't dial that long.Any ideas?

@erikdubbelboer
Copy link
Collaborator

Maybe you can try printing between these lines instead:

fasthttp/tcpdialer.go

Lines 186 to 187 in e5f51c1

conn, err = tryDial(network, &addrs[idx%n], deadline, d.concurrencyCh)
if err == nil {

And print addr as well to see which domain(s) is/are causing this.

@takakawa
Copy link
Author

I finally found the reason.I was testing network,So I used iptables to drop some packet,which surely made some host unachievable.And the net.ipv4.tcp_syn_retries is 6.That added up to nearly 2m7s.

So when a host became unachievable, and the net.ipv4.tcp_syn_retries is very large.The 1000 limit runs out very quickly.

@erikdubbelboer
Copy link
Collaborator

I guess the question is now should fasthttp protect the user against this or should we remove it and just let the system run out of file descriptors or sockets to use. I'm in favor of the second, let the user configure things the way they like. @kirillDanshin what do you think?

@erikdubbelboer
Copy link
Collaborator

@kirillDanshin what do you think? If you don't care I'll change the code to remove this protection (yay less code) and let the OS handle it.

@kirillDanshin
Copy link
Collaborator

@erikdubbelboer I think we should only add an option to disable this, but not removing this at all

@erikdubbelboer
Copy link
Collaborator

@kirillDanshin how do you see this option? It's currently implemented as a global option which makes it very hard to make this configurable. One way is to add a another version of all Dial, DialTimeout, DialDualStack and DialDualStackTimeout but that gets really messy. What do you have against removing the code completely? It's not good code and a very arbitrary limit at the moment. It makes much more sense to have people control this using the OS by setting the socket limit for the process. That's how any other program or library handles things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants