-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stuck connecting #963
Comments
Update: The problem occurs with 3 different wifi networks as well as cellular |
I'm not sure if this is related, just posting fyi; i have noticed this issue on my iPhone when switching from Wifi to 4G (when i go just out of reach of the Wifi network.) It stays stuck like this forever, until i toggle the "Status" switch to "Off", and back to "On", and then it will say "Connected" and everything works again. (I should note that the Algo server i am running was deployed a long time ago, in December 2017, on AWS EC2) |
@notDavid I'm having the same problem. I deployed the I've even had the disconnect / reconnect loop on my MBA recently. I'm not sure what the problem is. It almost feels like some type of session timeout issue. If I leave the VPN disconnected for a few hours it will allow me to reconnect. Would it be possible when the device disconnects the session is not properly terminated on the VPN? |
@notDavid I disabled charon.dos_protection and I haven't had the problem all weekend. |
@QuentinMoss thanks for sharing that! I've disabled charon.dos_protection also, lets see if the problem reoccurs in the next week... |
@notDavid @QuentinMoss I also have a test server using the ansible2.5 branch. Where's the charon.dos_protection setting in order to disable it? strongswan.conf? then restart? |
@digeratus in file |
Thanks. Then "ipsec reload/restart" for it take? what command?
… On Jul 24, 2018, at 6:29 PM, David ***@***.***> wrote:
@digeratus in file /etc/strongswan.d/charon.conf search for dos_protection
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@QuentinMoss This solved the connection issues for me as well... great find! |
Like @QuentinMoss I've had connect/disconnect loops on a deployment using the |
Thanks folks. Need to get it covered in the docs |
How bout #1042 |
Just a follow-up, even with disable dos_protection I still get the connect/disconnect loops on occasion. I can post a log, if someone gives me an idea what to grep for - charon seems to post about 200 entries for each connect/disconnect loop. |
I also had a connect/disconnect loop with
The problem wouldn't clear until I restarted strongswan. The problem server is on DO. The iOS device was still able to connect to an older Algo server I have on EC2. I configure my servers with Edited to add: I've seen other odd iOS networking problems since iOS 11.4.1 was released on 2018-07-09. Maybe the connect/disconnect loop is related. |
@TC1977 and @davidemyers email me please! dan trailofbits |
@davidemyers I scanned my logs and also found similar entries for the period of time in question:
Maybe this is more of a strongswan problem than an Algo install problem? Do they have similar reports in the strongswan docs? I'll try googling. |
Check out this issue which sounds similar: https://wiki.strongswan.org/issues/431 Going by the first solution suggested, I inserted the line |
I think this issue sounds more like what we're seeing: https://wiki.strongswan.org/issues/2607 I'm trying |
@davidemyers Interesting that in that issue, though, the guy had The next thing I'll try, if "reauth=no" doesn't work, will be to enable "make_before_break" in /etc/strongswan.d/strongswan.conf. Of course it might then mess up non-macOS or non-iOS devices that have a different IKEv2 implementation. (maybe that explains why you've seen networking errors since iOS 11.4.1?) |
When I had a device in a connect loop, I ran If your server was deployed in May it's using the old cipher suite, and you appear to be on AWS. I'm testing on DO with an I also have an old cipher suite server on EC2 deployed in late May and I've never seen this problem there. Weird. |
You're correct; I'm on the old cipher suite, on AWS. Also on checking my logs tonight, it appears I continued to have "unable to install policy" errors, and old policies upon checking |
So after two days of running with "auto=route" in Edit: changed filename above. |
@TC1977 so which charon.conf do you feel show the best results/most promise? I don't mind setting up a few droplets with different settings to get to the bottom of this. |
I, on the other hand, have had none of those error messages, have no orphaned policies, and have yet to have a reconnect loop. But I don't think it's been long enough yet to declare Did you flush the policies before testing Sometimes when I've had problems getting an iOS device to connect (but not when it's looping) I find it helps to toggle Wi-Fi off and on. |
@davidemyers, question for you: I'm doing some reading and it seems that
Are you getting any similar messages if you |
I don't have any of those messages in You don't have to deploy a new server to try |
Ok, I just caught it! I had ssh open and was running
And
where |
@digeratus @davidemyers I've been running with the current setup for four days now without any drops or reconnection loops. Anecdotally, I notice that the VPN stays connected much longer, and I don't get any "leaks" where the phones are checking mail from their regular (cell tower) IPs rather than the VPN. The only issue now is a whole ton of
Current
Current |
I've now gone 45 days without a reconnection loop. Since we now know that the Clients appear stuck in a reconnection loopIf you're using 'Connect on Demand' on iOS or macOS and your client device appears stuck in a reconnection loop while trying to connect to the VPN, the following changes to the default IPsec configuration might help. PLEASE NOTE: In order to use this particular configuration, every device must connect as a unique Algo user (as defined by Make the following changes on the Algo server:
Here are the changes above as shell commands: # This Perl command will create a backup copy of /etc/ipsec.conf named
# /etc/ipsec.conf.orig
sudo perl -p -i.orig -e 's/uniqueids=never.*$/uniqueids=yes/;' \
-e 's/auto=add/auto=route/;' /etc/ipsec.conf
# Restart IPsec after flushing the xfrm policies
sudo ipsec stop; sudo ip xfrm policy flush; sudo ipsec start |
@davidemyers Sounds good to me. I've been traveling for the last few days and although the loops aren't as bad as before, I've had to go in and restart the server a couple of times. It seems the biggest problems are associated with hotel and other public Wi-Fis with captive portal login pages. The first connection will go through fine, but after the iPhone goes to sleep and disconnects, it has a hell of a time logging back in and connecting to the Algo server. I'm not sure if this is the same problem, though. I'll check out the logs when I get back home. |
Aaargh! I just had a reconnect loop. So in regards to my previous post: Never mind. |
Ok @davidemyers, so I finally have time to try it your way. I've downloaded the latest Algo commit 399d472 and installed onto a brand new AWS instance, encrypted, connect on demand Wi-Fi and LTE, Wireguard disabled, dnscrypt-proxy and dnsmasq on. I created a separate .mobileconfig for each device, changed
This is the error message I've usually received when trying |
Well, that didn't take long. The problem is even worse in a way, because with
I've rebooted the server, loaded higher level logging settings in |
As expected, it failed overnight, and therefore freaked out this morning. Nothing useful in the logs that we haven't seen before. I've restarted the server with a couple of other options enabled in
I was hesitant to use |
So after having the above config for a few hours, plus
|
Similar failure today, after no failures yesterday. I'm now getting rid of
|
Has anyone tried configuring rekeying properly? Seems that's the cause of everything here |
And I'd rather to file an issue to the StrongSwan bugtracker or someone's done that already? |
The symptoms we're seeing are similar to those already reported here: https://wiki.strongswan.org/issues/2607 I've not added to that issue as I don't feel I really understand what's going on. |
I don't understand what's going on at all. I had another hard connect/reconnect loop 2 days ago, but didn't see any Ultimately I think solving the problem will also require editing the settings in the client config, which would require also editing the .mobileconfig in Apple Configurator and reinstalling with every iteration. Is this problem only seen with Apple devices, and only when "Connect on Demand" is enabled? |
If anyone is still following this, please try this. Next time you get a reconnect loop, try Meanwhile I've modified my ipsec.conf and the mobileconfigs further, with some success, but maybe at this point I'm better off putting these into a separate branch to track the changes more easily. |
I've been running the same config for one week at this point with no stale policies and no connect/reconnect loops. Here's my
Here's my
In addition to this, I've also changed my mobileconfig to change the @digeratus Maybe you want to test this config out? I'd like to know if this config plays well with non-Apple clients. |
@TC1977 I just got back. Will test today |
Still no further "policy failed" errors after another week, running with The changes are available at TC1977/algo, with one exception. The
|
@TC1977 Seems like things are ok on this front. Has anyone else tested? |
OS / Environment (where do you run Algo on)
ubuntu 16.04
Cloud Provider (where do you deploy Algo to)
digitalocean
Summary of the problem
Devices are sometimes stuck "connecting" randomly. When one device is experiencing the problem other devices are still working fine. The device is usually stuck for about 3-5 minutes and then everything returns to normal.
Steps to reproduce the behavior
The text was updated successfully, but these errors were encountered: