Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stuck connecting #963

Closed
Omoeba opened this issue May 24, 2018 · 71 comments · Fixed by #1379
Closed

Stuck connecting #963

Omoeba opened this issue May 24, 2018 · 71 comments · Fixed by #1379

Comments

@Omoeba
Copy link

Omoeba commented May 24, 2018

OS / Environment (where do you run Algo on)

ubuntu 16.04

Cloud Provider (where do you deploy Algo to)

digitalocean

Summary of the problem

Devices are sometimes stuck "connecting" randomly. When one device is experiencing the problem other devices are still working fine. The device is usually stuck for about 3-5 minutes and then everything returns to normal.

Steps to reproduce the behavior

  1. Set up algo on a Mac
  2. Change users in configs.cfg
  3. Enable everything except keep CA and support for windows and linux
@davidemyers
Copy link
Contributor

This might be a problem with your router. See #520 and #727 for suggestions on how to fix it.

@Omoeba
Copy link
Author

Omoeba commented May 24, 2018

Update: The problem occurs with 3 different wifi networks as well as cellular

@notDavid
Copy link

I'm not sure if this is related, just posting fyi; i have noticed this issue on my iPhone when switching from Wifi to 4G (when i go just out of reach of the Wifi network.)
My iPhone then sometimes goes completely offline (i have connect on demand set to enabled ) and if i go to settings -> VPN i see Status = Connecting...

It stays stuck like this forever, until i toggle the "Status" switch to "Off", and back to "On", and then it will say "Connected" and everything works again.

(I should note that the Algo server i am running was deployed a long time ago, in December 2017, on AWS EC2)

@QuentinMoss
Copy link
Contributor

QuentinMoss commented Jul 22, 2018

@notDavid I'm having the same problem. I deployed the ansible2.5 branch, but no improvement.

I've even had the disconnect / reconnect loop on my MBA recently. I'm not sure what the problem is. It almost feels like some type of session timeout issue. If I leave the VPN disconnected for a few hours it will allow me to reconnect.

Would it be possible when the device disconnects the session is not properly terminated on the VPN?

@QuentinMoss
Copy link
Contributor

QuentinMoss commented Jul 23, 2018

@notDavid I disabled charon.dos_protection and I haven't had the problem all weekend.

@notDavid
Copy link

@QuentinMoss thanks for sharing that! I've disabled charon.dos_protection also, lets see if the problem reoccurs in the next week...

@digeratus
Copy link

@notDavid @QuentinMoss I also have a test server using the ansible2.5 branch. Where's the charon.dos_protection setting in order to disable it? strongswan.conf? then restart?

@notDavid
Copy link

@digeratus in file /etc/strongswan.d/charon.conf search for dos_protection

@digeratus
Copy link

digeratus commented Jul 25, 2018 via email

@notDavid
Copy link

I disabled charon.dos_protection and I haven't had the problem all weekend.

@QuentinMoss This solved the connection issues for me as well... great find!

@davidemyers
Copy link
Contributor

Like @QuentinMoss I've had connect/disconnect loops on a deployment using the ansible2.5 branch, in my case with an iPad. I've also not seen the problem with dos_protection disabled.

@jackivanov
Copy link
Collaborator

jackivanov commented Jul 29, 2018

Thanks folks. Need to get it covered in the docs

@QuentinMoss
Copy link
Contributor

How bout #1042

@TC1977
Copy link
Contributor

TC1977 commented Aug 15, 2018

Just a follow-up, even with disable dos_protection I still get the connect/disconnect loops on occasion. I can post a log, if someone gives me an idea what to grep for - charon seems to post about 200 entries for each connect/disconnect loop.

@davidemyers
Copy link
Contributor

davidemyers commented Aug 15, 2018

I also had a connect/disconnect loop with dos_protection off. When it happens I see log entries like this:

Aug 14 13:53:29 vpn4 charon[13590]: 08[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 943, the same policy for reqid 16 exists
Aug 14 13:53:29 vpn4 charon[13590]: 08[IKE] unable to install IPsec policies (SPD) in kernel

The problem wouldn't clear until I restarted strongswan. The problem server is on DO. The iOS device was still able to connect to an older Algo server I have on EC2.

I configure my servers with uniqueids=yes in /etc/ipsec.conf so my setup is not quite the same as other Algo users. I also use the DO firewall.

Edited to add: I've seen other odd iOS networking problems since iOS 11.4.1 was released on 2018-07-09. Maybe the connect/disconnect loop is related.

@dguido
Copy link
Member

dguido commented Aug 15, 2018

@TC1977 and @davidemyers email me please! dan trailofbits

@TC1977
Copy link
Contributor

TC1977 commented Aug 17, 2018

@davidemyers I scanned my logs and also found similar entries for the period of time in question:

Aug 14 16:58:53 ip-172-31-39-82 charon: 16[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.2/32 out for reqid 26, the same policy for reqid 24 exists
Aug 14 16:58:53 ip-172-31-39-82 charon: 16[IKE] unable to install IPsec policies (SPD) in kernel
Aug 14 16:58:53 ip-172-31-39-82 charon: 16[IKE] failed to establish CHILD_SA, keeping IKE_SA
Aug 14 16:58:53 ip-172-31-39-82 charon: 16[KNL] deleting policy 0.0.0.0/0 === 10.19.48.2/32 out
Aug 14 16:58:53 ip-172-31-39-82 charon: 16[KNL] policy still used by another CHILD_SA, not removed
Aug 14 16:58:53 ip-172-31-39-82 charon: 16[KNL] not updating policy 0.0.0.0/0 === 10.19.48.2/32 out [priority 383615, refcount 1]
Aug 14 16:58:53 ip-172-31-39-82 charon: 16[KNL] deleting policy 10.19.48.2/32 === 0.0.0.0/0 in
Aug 14 16:58:53 ip-172-31-39-82 charon: 16[KNL] deleting policy 10.19.48.2/32 === 0.0.0.0/0 fwd
Aug 14 16:58:53 ip-172-31-39-82 charon: 16[KNL] deleting policy ::/0 === fd9d:bc11:4020::2/128 out
Aug 14 16:58:53 ip-172-31-39-82 charon: 16[KNL] policy still used by another CHILD_SA, not removed
Aug 14 16:58:53 ip-172-31-39-82 charon: 16[KNL] not updating policy ::/0 === fd9d:bc11:4020::2/128 out [priority 334463, refcount 1]

Maybe this is more of a strongswan problem than an Algo install problem? Do they have similar reports in the strongswan docs? I'll try googling.

@TC1977
Copy link
Contributor

TC1977 commented Aug 17, 2018

Check out this issue which sounds similar: https://wiki.strongswan.org/issues/431

Going by the first solution suggested, I inserted the line reauth=no in /etc/ipsec.conf after rekey=no. I have no idea if this will work, and no idea if it will completely screw security - please if someone who knows what they're doing can comment? Anyway, I'll try it like this and see if I get the loop again. At the very least, I can still connect to the server and I can still web browse from client.

@davidemyers
Copy link
Contributor

I think this issue sounds more like what we're seeing: https://wiki.strongswan.org/issues/2607

I'm trying auto=route instead of auto=add as suggested by the issue reporter to see if that helps.

@TC1977
Copy link
Contributor

TC1977 commented Aug 18, 2018

@davidemyers Interesting that in that issue, though, the guy had rekey=yes and reauth=yes. Also I don't have any XFRMA messages or "trap not found" messages in my error logs. Just to compare notes, I'm running a version of Algo from late May (not the ansible2.5 branch), and 'ipsec version' gives me 'Linux strongSwan U5.6.2/K4.15.0-1019-aws'.

The next thing I'll try, if "reauth=no" doesn't work, will be to enable "make_before_break" in /etc/strongswan.d/strongswan.conf. Of course it might then mess up non-macOS or non-iOS devices that have a different IKEv2 implementation. (maybe that explains why you've seen networking errors since iOS 11.4.1?)

@davidemyers
Copy link
Contributor

When I had a device in a connect loop, I ran ip xfrm policy list and found entries for the conflicting reqid that should have been deleted (there were out entries with no matching in or fwd entries). I don't really understand what this means, but I thought it could be the same symptom being reported in issue 2607. I made sure the bad entries were cleaned out by running ipsec stop; ip xfrm policy flush; ipsec start.

If your server was deployed in May it's using the old cipher suite, and you appear to be on AWS. I'm testing on DO with an ansible2.5 branch server with the new cipher suite. I assume you're using the Algo default of uniqueids=never while I'm using the strongSwan default of uniqueids=yes. So that gives us a few things we can rule out.

I also have an old cipher suite server on EC2 deployed in late May and I've never seen this problem there. Weird.

@TC1977
Copy link
Contributor

TC1977 commented Aug 19, 2018

You're correct; I'm on the old cipher suite, on AWS. Also on checking my logs tonight, it appears I continued to have "unable to install policy" errors, and old policies upon checking sudo ip xfrm policy list. I didn't notice any problems connecting, though. I'll delete the "reauth=no" line in /etc/ipsec.conf, and try the "auto=route" line.

@TC1977
Copy link
Contributor

TC1977 commented Aug 21, 2018

So after two days of running with "auto=route" in ipsec.conf and reviewing syslog, I continue to get "unable to install policy" errors, and sudo ip xfrm policy list continues to show dead policies. Here's the thing, though. I'm not sure this is causing any problems that I've noticed, and I definitely haven't seen any "Connecting..." "Disconnecting..." loops. My wife had a lot of problems connecting with her iPhone, but I just reinstalled the mobileconfig via Airdrop, and now it works fine. So I'm at a loss as to whether these errors actually correspond to something the end user will notice. Just for the hell of it, I've added reauth=no back into ipsec.conf, as well as lifetime=1h to see if it makes a difference.

Edit: changed filename above.

@digeratus
Copy link

@TC1977 so which charon.conf do you feel show the best results/most promise? I don't mind setting up a few droplets with different settings to get to the bottom of this.

@davidemyers
Copy link
Contributor

I, on the other hand, have had none of those error messages, have no orphaned policies, and have yet to have a reconnect loop. But I don't think it's been long enough yet to declare auto=route a fix.

Did you flush the policies before testing auto=route like I mentioned in my previous message?

Sometimes when I've had problems getting an iOS device to connect (but not when it's looping) I find it helps to toggle Wi-Fi off and on.

@TC1977
Copy link
Contributor

TC1977 commented Sep 19, 2018

@davidemyers, question for you: I'm doing some reading and it seems that auto=route isn't recommended with right=%any. See this link, this link, and this link. My current config has been working well, without any drops and switching from LTE to Wifi without problems, but I'm getting a ton of these messages:

Sep 18 22:20:36 ip-172-16-254-145 charon: 05[CFG] installing trap failed, remote address unknown
Sep 18 22:41:14 ip-172-16-254-145 charon: 16[CFG] installing trap failed, remote address unknown
Sep 18 22:54:07 ip-172-16-254-145 charon: 11[CFG] installing trap failed, remote address unknown
Sep 18 23:08:23 ip-172-16-254-145 charon: 14[CFG] installing trap failed, remote address unknown
Sep 18 23:43:43 ip-172-16-254-145 charon: 14[CFG] installing trap failed, remote address unknown
Sep 18 23:54:06 ip-172-16-254-145 charon: 13[CFG] installing trap failed, remote address unknown
Sep 19 00:01:58 ip-172-16-254-145 charon: 10[CFG] installing trap failed, remote address unknown
Sep 19 00:07:17 ip-172-16-254-145 charon: 13[CFG] installing trap failed, remote address unknown
Sep 19 00:20:40 ip-172-16-254-145 charon: 11[CFG] installing trap failed, remote address unknown
Sep 19 00:28:21 ip-172-16-254-145 charon: 11[CFG] installing trap failed, remote address unknown
Sep 19 00:42:53 ip-172-16-254-145 charon: 11[CFG] installing trap failed, remote address unknown
Sep 19 00:54:10 ip-172-16-254-145 charon: 07[CFG] installing trap failed, remote address unknown
Sep 19 01:08:20 ip-172-16-254-145 charon: 05[CFG] installing trap failed, remote address unknown
Sep 19 01:20:17 ip-172-16-254-145 charon: 14[CFG] installing trap failed, remote address unknown
Sep 19 01:32:27 ip-172-16-254-145 charon: 09[CFG] installing trap failed, remote address unknown
Sep 19 01:39:01 ip-172-16-254-145 charon: 14[CFG] installing trap failed, remote address unknown
Sep 19 01:45:34 ip-172-16-254-145 charon: 12[CFG] installing trap failed, remote address unknown
Sep 19 02:05:27 ip-172-16-254-145 charon: 12[CFG] installing trap failed, remote address unknown

Are you getting any similar messages if you grep trap /var/log/syslog?

@davidemyers
Copy link
Contributor

I don't have any of those messages in syslog. I don't know what to make of those strongSwan issues but I'm not having any issues with my configuration, now at 32 days of uptime. The two iOS devices on my test server are now on iOS 12 and working fine.

You don't have to deploy a new server to try uniqueids=yes as long as all of your devices are already using different mobileconfigs.

@TC1977
Copy link
Contributor

TC1977 commented Sep 19, 2018

Ok, I just caught it! I had ssh open and was running tail -f /var/log/syslog|grep charon, watching for messages. I had dpdaction=hold as above, and auto=add, but in an effort to get rid of the installing trap failed messages, I switched it back to dpdaction=clear, and ran sudo ipsec reload. I had two iOS clients active on LTE, using two separate mobileconfigs, sending out DPD requests every few seconds. Then I got this:

Sep 19 14:20:30 ip-172-16-254-145 charon: 07[NET] sending packet: from 172.16.254.145[4500] to xxx.xxx.xxx.xxx[4500] (113 bytes)
Sep 19 14:20:30 ip-172-16-254-145 charon: 08[NET] received packet: from xxx.xxx.xxx.xxx[4500] to 172.16.254.145[4500] (72 bytes)
Sep 19 14:20:30 ip-172-16-254-145 charon: 08[ENC] parsed INFORMATIONAL request 80 [ D ]
Sep 19 14:20:30 ip-172-16-254-145 charon: 08[IKE] received DELETE for IKE_SA ikev2-pubkey[19]
Sep 19 14:20:30 ip-172-16-254-145 charon: 08[IKE] deleting IKE_SA ikev2-pubkey[19] between 172.16.254.145[52.22.108.80]... xxx.xxx.xxx.xxx[user1]
Sep 19 14:20:30 ip-172-16-254-145 charon: 08[IKE] IKE_SA deleted
Sep 19 14:20:30 ip-172-16-254-145 charon: 08[ENC] generating INFORMATIONAL response 80 [ ]
Sep 19 14:20:30 ip-172-16-254-145 charon: 08[NET] sending packet: from 172.16.254.145[4500] to xxx.xxx.xxx.xxx[4500] (57 bytes)
Sep 19 14:20:30 ip-172-16-254-145 charon: 08[CFG] lease fd9d:bc11:4020::4 by 'user1' went offline
Sep 19 14:20:30 ip-172-16-254-145 charon: 08[CFG] lease 10.19.48.4 by 'user1' went offline
Sep 19 14:20:31 ip-172-16-254-145 charon: 15[KNL] creating acquire job for policy 40.97.145.146/32[tcp/https] === 10.19.48.4/32[tcp/54237] with reqid {14}
Sep 19 14:20:31 ip-172-16-254-145 charon: 15[CFG] trap not found, unable to acquire reqid 14

And sudo ip xfrm pol list shows this:

src ::/0 dst fd9d:bc11:4020::4/128 
	dir out priority 334463 
	tmpl src 172.16.254.145 dst xxx.xxx.xxx.xxx
		proto esp spi 0x0633e4b6 reqid 14 mode tunnel
src 0.0.0.0/0 dst 10.19.48.4/32 
	dir out priority 383615 
	tmpl src 172.16.254.145 dst xxx.xxx.xxx.xxx
		proto esp spi 0x0633e4b6 reqid 14 mode tunnel
src ::/0 dst fd9d:bc11:4020::1/128 
	dir out priority 334463 
	tmpl src 172.16.254.145 dst yyy.yyy.yyy.yyy
		proto esp spi 0x08a2ae61 reqid 15 mode tunnel
src fd9d:bc11:4020::1/128 dst ::/0 
	dir fwd priority 334463 
	tmpl src yyy.yyy.yyy.yyy dst 172.16.254.145
		proto esp reqid 15 mode tunnel
src fd9d:bc11:4020::1/128 dst ::/0 
	dir in priority 334463 
	tmpl src yyy.yyy.yyy.yyy dst 172.16.254.145
		proto esp reqid 15 mode tunnel
src 0.0.0.0/0 dst 10.19.48.1/32 
	dir out priority 383615 
	tmpl src 172.16.254.145 dst yyy.yyy.yyy.yyy
		proto esp spi 0x08a2ae61 reqid 15 mode tunnel
src 10.19.48.1/32 dst 0.0.0.0/0 
	dir fwd priority 383615 
	tmpl src yyy.yyy.yyy.yyy dst 172.16.254.145
		proto esp reqid 15 mode tunnel
src 10.19.48.1/32 dst 0.0.0.0/0 
	dir in priority 383615 
	tmpl src yyy.yyy.yyy.yyy dst 172.16.254.145
		proto esp reqid 15 mode tunnel
src 0.0.0.0/0 dst 0.0.0.0/0 
	socket in priority 0 
src 0.0.0.0/0 dst 0.0.0.0/0 
	socket out priority 0 
src 0.0.0.0/0 dst 0.0.0.0/0 
	socket in priority 0 
src 0.0.0.0/0 dst 0.0.0.0/0 
	socket out priority 0 
src ::/0 dst ::/0 
	socket in priority 0 
src ::/0 dst ::/0 
	socket out priority 0 
src ::/0 dst ::/0 
	socket in priority 0 
src ::/0 dst ::/0 
	socket out priority 0 

where xxx.xxx.xxx.xxx is the IP of the phone that just got orphaned, and yyy.yyy.yyy.yyy is the IP of the other phone. So I think it's an issue where dpdaction=clear isn't working properly. I'm on auto=add btw. I'm going to switch back to dpdaction=hold and see if that helps.

@TC1977
Copy link
Contributor

TC1977 commented Sep 24, 2018

@digeratus @davidemyers I've been running with the current setup for four days now without any drops or reconnection loops. Anecdotally, I notice that the VPN stays connected much longer, and I don't get any "leaks" where the phones are checking mail from their regular (cell tower) IPs rather than the VPN. The only issue now is a whole ton of installing trap failed, remote address unknown messages, but I can live with that. The two iPhones are on iOS 12.0 and 11.4.1.

ubuntu@ip-172-16-254-145:~$ sudo ipsec statusall | head -2
Status of IKE charon daemon (strongSwan 5.6.2, Linux 4.15.0-1021-aws, x86_64):
  uptime: 4 days, since Sep 19 21:42:41 2018
ubuntu@ip-172-16-254-145:~$ journalctl | grep unable | tail -1
Sep 19 20:26:22 ip-172-16-254-145 ipsec[834]: 13[CFG] trap not found, unable to acquire reqid 29

Current /etc/ipsec.conf:

config setup
    uniqueids=never # allow multiple connections per user
    charondebug="ike 1, knl 1, cfg 1, net 1, esp 1, dmn 1,  mgr 1"

conn %default
    fragmentation=yes
    rekey=no
    forceencaps=yes
    dpdaction=hold
    keyexchange=ikev2
    compress=yes
    dpddelay=35s
    inactivity=3600s
    ikelifetime=28800s

    ike=aes256gcm16-prfsha512-ecp384,aes256-sha2_512-prfsha512-ecp384,aes256-sha2_384-prfsha384-ecp384!
    esp=aes256gcm16-ecp384,aes256-sha2_512-prfsha512-ecp384!

    left=%any
    leftauth=pubkey
    leftid=[my.algo.ip]
    leftcert=[my.algo.ip].crt
    leftsendcert=always
    leftsubnet=0.0.0.0/0,::/0

    right=%any
    rightauth=pubkey
    rightsourceip=10.19.48.0/24,fd9d:bc11:4020::/48
    rightdns=172.16.0.1

conn ikev2-pubkey
    auto=add

Current /etc/strongswan.d/charon.conf settings are delete_rekeyed_delay = 10 and keep_alive = 25s.

@davidemyers
Copy link
Contributor

I've now gone 45 days without a reconnection loop. Since we now know that the dos_protection change isn't enough to solve the problem, I propose replacing the section of the Troubleshooting document added by @QuentinMoss with the following:


Clients appear stuck in a reconnection loop

If you're using 'Connect on Demand' on iOS or macOS and your client device appears stuck in a reconnection loop while trying to connect to the VPN, the following changes to the default IPsec configuration might help.

PLEASE NOTE: In order to use this particular configuration, every device must connect as a unique Algo user (as defined by users in config.cfg).

Make the following changes on the Algo server:

  1. Edit /etc/ipsec.conf:
    • Change uniqueids=never to uniqueids=yes (near the top of the file)
    • Change auto=add to auto=route (near the bottom of the file)
  2. Restart IPsec and flush the xfrm policies:
    • sudo ipsec stop
    • sudo ip xfrm policy flush
    • sudo ipsec start

Here are the changes above as shell commands:

# This Perl command will create a backup copy of /etc/ipsec.conf named
# /etc/ipsec.conf.orig
sudo perl -p -i.orig -e 's/uniqueids=never.*$/uniqueids=yes/;' \
	-e 's/auto=add/auto=route/;' /etc/ipsec.conf

# Restart IPsec after flushing the xfrm policies
sudo ipsec stop; sudo ip xfrm policy flush; sudo ipsec start

@TC1977
Copy link
Contributor

TC1977 commented Oct 2, 2018

@davidemyers Sounds good to me. I've been traveling for the last few days and although the loops aren't as bad as before, I've had to go in and restart the server a couple of times. It seems the biggest problems are associated with hotel and other public Wi-Fis with captive portal login pages. The first connection will go through fine, but after the iPhone goes to sleep and disconnects, it has a hell of a time logging back in and connecting to the Algo server. I'm not sure if this is the same problem, though. I'll check out the logs when I get back home.

@davidemyers
Copy link
Contributor

Aaargh! I just had a reconnect loop. So in regards to my previous post:

Never mind.

@TC1977
Copy link
Contributor

TC1977 commented Nov 2, 2018

Ok @davidemyers, so I finally have time to try it your way. I've downloaded the latest Algo commit 399d472 and installed onto a brand new AWS instance, encrypted, connect on demand Wi-Fi and LTE, Wireguard disabled, dnscrypt-proxy and dnsmasq on. I created a separate .mobileconfig for each device, changed /etc/ipsec.conf to match your config of uniqueids=yes, auto=route, and ran sudo ipsec reload. One thing I noticed off the bat was this error message:

Nov 01 22:46:28 ip-172-16-254-163 charon[6812]: 06[CFG] reusing virtual IP address pool 10.19.48.0/24
Nov 01 22:46:28 ip-172-16-254-163 charon[6812]: 06[CFG] virtual IP pool too large, limiting to fd9d:bc11:4020::/97
Nov 01 22:46:28 ip-172-16-254-163 charon[6812]: 06[CFG] reusing virtual IP address pool fd9d:bc11:4020::/48
Nov 01 22:46:28 ip-172-16-254-163 charon[6812]: 06[CFG]   loaded certificate "CN=54.82.89.174" from '54.82.89.174.crt'
Nov 01 22:46:28 ip-172-16-254-163 charon[6812]: 06[CFG] added configuration 'ikev2-pubkey'
Nov 01 22:46:28 ip-172-16-254-163 charon[6812]: 08[CFG] received stroke: route 'ikev2-pubkey'
Nov 01 22:46:28 ip-172-16-254-163 charon[6812]: 08[CFG] installing trap failed, remote address unknown
Nov 01 22:46:28 ip-172-16-254-163 ipsec_starter[6788]: routing 'ikev2-pubkey' failed
Nov 01 22:46:28 ip-172-16-254-163 ipsec_starter[6788]: 

This is the error message I've usually received when trying auto=route. We'll see how it goes.

@jackivanov jackivanov reopened this Nov 2, 2018
@TC1977
Copy link
Contributor

TC1977 commented Nov 3, 2018

Well, that didn't take long. The problem is even worse in a way, because with uniqueids=yes the strongSwan server repeatedly tries to assign the same (stale) IP address to the client, with continuing errors. Check this out:

ubuntu@ip-172-16-254-163:~$ grep unable /var/log/syslog|grep charon
Nov  3 09:07:50 ip-172-16-254-163 charon: 10[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.1/32 out for reqid 46, the same policy for reqid 45 exists
Nov  3 09:07:50 ip-172-16-254-163 charon: 10[IKE] unable to install IPsec policies (SPD) in kernel
Nov  3 09:07:51 ip-172-16-254-163 charon: 06[CFG] trap not found, unable to acquire reqid 45
Nov  3 09:07:58 ip-172-16-254-163 charon: 05[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.1/32 out for reqid 47, the same policy for reqid 45 exists
Nov  3 09:07:58 ip-172-16-254-163 charon: 05[IKE] unable to install IPsec policies (SPD) in kernel
Nov  3 09:07:59 ip-172-16-254-163 charon: 11[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.1/32 out for reqid 48, the same policy for reqid 45 exists
Nov  3 09:07:59 ip-172-16-254-163 charon: 11[IKE] unable to install IPsec policies (SPD) in kernel
...
Nov  3 09:15:48 ip-172-16-254-163 charon: 06[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.1/32 out for reqid 366, the same policy for reqid 45 exists
Nov  3 09:15:48 ip-172-16-254-163 charon: 06[IKE] unable to install IPsec policies (SPD) in kernel
Nov  3 09:15:49 ip-172-16-254-163 charon: 08[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.1/32 out for reqid 367, the same policy for reqid 45 exists
Nov  3 09:15:49 ip-172-16-254-163 charon: 08[IKE] unable to install IPsec policies (SPD) in kernel
ubuntu@ip-172-16-254-163:~$ sudo ip xfrm pol list
src ::/0 dst fd9d:bc11:4020::1/128 
	dir out priority 334463 
	tmpl src 172.16.254.163 dst xxx.xxx.xxx.xxx
		proto esp spi 0x0163ef20 reqid 45 mode tunnel
src 0.0.0.0/0 dst 10.19.48.1/32 
	dir out priority 383615 
	tmpl src 172.16.254.163 dst xxx.xxx.xxx.xxx
		proto esp spi 0x0163ef20 reqid 45 mode tunnel
src 0.0.0.0/0 dst 0.0.0.0/0 
	socket in priority 0 
src 0.0.0.0/0 dst 0.0.0.0/0 
	socket out priority 0 
src 0.0.0.0/0 dst 0.0.0.0/0 
	socket in priority 0 
src 0.0.0.0/0 dst 0.0.0.0/0 
	socket out priority 0 
src ::/0 dst ::/0 
	socket in priority 0 
src ::/0 dst ::/0 
	socket out priority 0 
src ::/0 dst ::/0 
	socket in priority 0 
src ::/0 dst ::/0 
	socket out priority 0 

I've rebooted the server, loaded higher level logging settings in /etc/ipsec.conf, and will post logs if and when it happens again.

@TC1977
Copy link
Contributor

TC1977 commented Nov 4, 2018

As expected, it failed overnight, and therefore freaked out this morning. Nothing useful in the logs that we haven't seen before.

I've restarted the server with a couple of other options enabled in etc/strongswan.d/charon.conf to try to get those child_SA's closed.

close_ike_on_child_failure = yes
keep_alive = 25s
make_before_break = yes

I was hesitant to use make_before_break before, even though it was specifically mentioned in strongswan issue 2607, because I noticed a performance hit right after enabling it. But at this point I just care more about stability than anything else.

@TC1977
Copy link
Contributor

TC1977 commented Nov 5, 2018

So after having the above config for a few hours, plus inactivity=3600s and ikelifetime=28800s in ipsec.conf, I've already run into another failure to delete a policy, with the resulting connecting/reconnecting loops. But this was yet another error message I hadn't seen before. You can also see that close_ike_on_child_failure didn't actually close the duplicate outgoing policy either. Googling "not enough input to parse rule 0 U_INT_8" led me to strongSwan issue #2438, which seems related to lifetime issues. So I've deleted inactivity=3600s and ikelifetime=28800s and restarted.

Nov  4 17:37:52 ip-172-16-254-163 charon: 12[NET] received unencrypted informational: from xxx.xxx.xxx.103[4500] to 172.16.254.163[4500]
Nov  4 17:37:52 ip-172-16-254-163 charon: 12[ENC]   not enough input to parse rule 0 U_INT_8
Nov  4 17:37:52 ip-172-16-254-163 charon: 12[ENC] payload type DELETE could not be parsed
Nov  4 17:37:52 ip-172-16-254-163 charon: 12[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov  4 17:37:55 ip-172-16-254-163 charon: 05[NET] received unencrypted informational: from xxx.xxx.xxx.103[4500] to 172.16.254.163[4500]
Nov  4 17:37:55 ip-172-16-254-163 charon: 05[ENC]   not enough input to parse rule 0 U_INT_8
Nov  4 17:37:55 ip-172-16-254-163 charon: 05[ENC] payload type DELETE could not be parsed
Nov  4 17:37:55 ip-172-16-254-163 charon: 05[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov  4 17:37:58 ip-172-16-254-163 charon: 15[NET] received unencrypted informational: from xxx.xxx.xxx.103[4500] to 172.16.254.163[4500]
Nov  4 17:37:58 ip-172-16-254-163 charon: 15[ENC]   not enough input to parse rule 0 U_INT_8
Nov  4 17:37:58 ip-172-16-254-163 charon: 15[ENC] payload type DELETE could not be parsed
Nov  4 17:37:58 ip-172-16-254-163 charon: 15[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov  4 17:38:01 ip-172-16-254-163 charon: 13[NET] received unencrypted informational: from xxx.xxx.xxx.103[4500] to 172.16.254.163[4500]
Nov  4 17:38:01 ip-172-16-254-163 charon: 13[ENC]   not enough input to parse rule 0 U_INT_8
Nov  4 17:38:01 ip-172-16-254-163 charon: 13[ENC] payload type DELETE could not be parsed
Nov  4 17:38:01 ip-172-16-254-163 charon: 13[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov  4 17:38:05 ip-172-16-254-163 charon: 10[NET] received packet: from xxx.xxx.xxx.87[500] to 172.16.254.163[500] (272 bytes)
Nov  4 17:38:05 ip-172-16-254-163 charon: 10[ENC] parsed IKE_SA_INIT request 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) N(FRAG_SUP) ]
Nov  4 17:38:05 ip-172-16-254-163 charon: 10[CFG] looking for an ike config for 172.16.254.163...xxx.xxx.xxx.87
Nov  4 17:38:05 ip-172-16-254-163 charon: 10[CFG]   candidate: %any...%any, prio 28
Nov  4 17:38:05 ip-172-16-254-163 charon: 10[CFG] found matching ike config: %any...%any with prio 28
Nov  4 17:38:05 ip-172-16-254-163 charon: 10[IKE] xxx.xxx.xxx.87 is initiating an IKE_SA
[...]
Nov  4 17:38:05 ip-172-16-254-163 charon: 08[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.1/32 out for reqid 9, the same policy for reqid 7 exists
Nov  4 17:38:05 ip-172-16-254-163 charon: 08[IKE] unable to install IPsec policies (SPD) in kernel
Nov  4 17:38:05 ip-172-16-254-163 charon: 08[IKE] closing IKE_SA due CHILD_SA setup failure
Nov  4 17:38:05 ip-172-16-254-163 charon: 08[KNL] deleting policy 0.0.0.0/0 === 10.19.48.1/32 out
Nov  4 17:38:05 ip-172-16-254-163 charon: 08[KNL] policy still used by another CHILD_SA, not removed
Nov  4 17:38:05 ip-172-16-254-163 charon: 08[KNL] not updating policy 0.0.0.0/0 === 10.19.48.1/32 out [priority 383615, refcount 1]
Nov  4 17:38:05 ip-172-16-254-163 charon: 08[KNL] deleting policy 10.19.48.1/32 === 0.0.0.0/0 in
Nov  4 17:38:05 ip-172-16-254-163 charon: 08[KNL] deleting policy 10.19.48.1/32 === 0.0.0.0/0 fwd
Nov  4 17:38:05 ip-172-16-254-163 charon: 08[KNL] deleting policy ::/0 === fd9d:bc11:4020::1/128 out
Nov  4 17:38:05 ip-172-16-254-163 charon: 08[KNL] policy still used by another CHILD_SA, not removed
Nov  4 17:38:05 ip-172-16-254-163 charon: 08[KNL] not updating policy ::/0 === fd9d:bc11:4020::1/128 out [priority 334463, refcount 1]

@TC1977
Copy link
Contributor

TC1977 commented Nov 6, 2018

Similar failure today, after no failures yesterday. I'm now getting rid of forceencaps=yes in ipsec.conf, and close_ike_on_child_failure = yes and keep_alive = 25s in charon.conf. Next idea would be changing the client .mobileconfig, but I really didn't want to go down that route if possible.

Nov  6 15:09:32 ip-172-16-254-163 charon: 14[NET] received unencrypted informational: from xx.xx.xx.31[4500] to 172.16.254.163[4500]
Nov  6 15:09:32 ip-172-16-254-163 charon: 14[ENC]   not enough input to parse rule 0 U_INT_8
Nov  6 15:09:32 ip-172-16-254-163 charon: 14[ENC] payload type DELETE could not be parsed
Nov  6 15:09:32 ip-172-16-254-163 charon: 14[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov  6 15:09:35 ip-172-16-254-163 charon: 05[NET] received unencrypted informational: from xx.xx.xx.31[4500] to 172.16.254.163[4500]
Nov  6 15:09:35 ip-172-16-254-163 charon: 05[ENC]   not enough input to parse rule 0 U_INT_8
Nov  6 15:09:35 ip-172-16-254-163 charon: 05[ENC] payload type DELETE could not be parsed
Nov  6 15:09:35 ip-172-16-254-163 charon: 05[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov  6 15:09:38 ip-172-16-254-163 charon: 06[NET] received unencrypted informational: from xx.xx.xx.31[4500] to 172.16.254.163[4500]
Nov  6 15:09:38 ip-172-16-254-163 charon: 06[ENC]   not enough input to parse rule 0 U_INT_8
Nov  6 15:09:38 ip-172-16-254-163 charon: 06[ENC] payload type DELETE could not be parsed
Nov  6 15:09:38 ip-172-16-254-163 charon: 06[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov  6 15:09:41 ip-172-16-254-163 charon: 12[NET] received unencrypted informational: from xx.xx.xx.31[4500] to 172.16.254.163[4500]
Nov  6 15:09:41 ip-172-16-254-163 charon: 12[ENC]   not enough input to parse rule 0 U_INT_8
Nov  6 15:09:41 ip-172-16-254-163 charon: 12[ENC] payload type DELETE could not be parsed
Nov  6 15:09:41 ip-172-16-254-163 charon: 12[IKE] INFORMATIONAL request with message ID 0 processing failed

@jackivanov
Copy link
Collaborator

Has anyone tried configuring rekeying properly? Seems that's the cause of everything here

@jackivanov
Copy link
Collaborator

jackivanov commented Nov 8, 2018

And I'd rather to file an issue to the StrongSwan bugtracker or someone's done that already?

@davidemyers
Copy link
Contributor

The symptoms we're seeing are similar to those already reported here: https://wiki.strongswan.org/issues/2607

I've not added to that issue as I don't feel I really understand what's going on.

@TC1977
Copy link
Contributor

TC1977 commented Nov 10, 2018

I don't understand what's going on at all. I had another hard connect/reconnect loop 2 days ago, but didn't see any payload type DELETE could not be parsed messages. I'm trying rekey=yes and disabled make_before_break now. Also went back to uniqueids=never, as uniqueids=yes only seemed to make it impossible to recover from a stale policy (since it would loop back to the same virtual IP again and again).

Ultimately I think solving the problem will also require editing the settings in the client config, which would require also editing the .mobileconfig in Apple Configurator and reinstalling with every iteration.

Is this problem only seen with Apple devices, and only when "Connect on Demand" is enabled?

@TC1977
Copy link
Contributor

TC1977 commented Nov 20, 2018

> Sep 19 14:20:31 ip-172-16-254-145 charon: 15[KNL] creating acquire job for policy 40.97.145.146/32[tcp/https] === 10.19.48.4/32[tcp/54237] with reqid {14}
> Sep 19 14:20:31 ip-172-16-254-145 charon: 15[CFG] trap not found, unable to acquire reqid 14

If anyone is still following this, please try this. Next time you get a reconnect loop, try grep "creating acquire job" /var/log/syslog, and post it here. (Make sure your strongswan logging settings are on level 2, at least for 'knl'.) I noticed that looking back at this log, 40.97.145.146 is an IP owned by Microsoft, I'd guess on Azure, and I had another log around here somewhere which also implicated an IP owned by Microsoft. I wonder if we're all having problems with one specific provider blocking IPsec traffic. That might explain why some of us are having problems nearly every day, while some of us don't.

Meanwhile I've modified my ipsec.conf and the mobileconfigs further, with some success, but maybe at this point I'm better off putting these into a separate branch to track the changes more easily.

@TC1977
Copy link
Contributor

TC1977 commented Nov 23, 2018

I've been running the same config for one week at this point with no stale policies and no connect/reconnect loops. Here's my /etc/ipsec.conf:

config setup
    uniqueids=yes # do not allow multiple connections per user
    charondebug="ike 2, knl 2, cfg 2, net 1, esp 1, enc 1, dmn 1, mgr 1"

conn %default
    fragmentation=yes
    rekey=yes
    reauth=no
    dpdaction=clear
    keyexchange=ikev2
    compress=yes
    dpddelay=35s
    lifetime=3h
    ikelifetime=12h

    ike=aes256gcm16-prfsha512-ecp384,aes256-sha2_512-prfsha512-ecp384,aes256-sha2_384-prfsha384-ecp384!
    esp=aes256gcm16-ecp384,aes256-sha2_512-prfsha512-ecp384!

    left=%any
    leftauth=pubkey
    leftid=[redacted IP]
    leftcert=[redacted IP].crt
    leftsendcert=always
    leftsubnet=0.0.0.0/0,::/0

    right=%any
    rightauth=pubkey
    rightsourceip=10.19.48.0/24,fd9d:bc11:4020::/48
    rightdns=172.16.0.1

conn ikev2-pubkey
    auto=add

Here's my /etc/strongswan.d/charon.conf. Rationale behind the changes is to try to minimize a pause I observe when switching from Wi-Fi to LTE. It seems the iPhone opens a new IKE_SA, which hangs, and then another one which succeeds. Going from LTE to Wi-Fi is seamless.

# Options for the charon IKE daemon.
charon {

    # Accept unencrypted ID and HASH payloads in IKEv1 Main Mode.
    # accept_unencrypted_mainmode_messages = no

    # Maximum number of half-open IKE_SAs for a single peer IP.
    # block_threshold = 5

    # Whether Certificate Revocation Lists (CRLs) fetched via HTTP or LDAP
    # should be saved under a unique file name derived from the public key of
    # the Certification Authority (CA) to /etc/ipsec.d/crls (stroke) or
    # /etc/swanctl/x509crl (vici), respectively.
    # cache_crls = no

    # Whether relations in validated certificate chains should be cached in
    # memory.
    # cert_cache = yes

    # Send Cisco Unity vendor ID payload (IKEv1 only).
    # cisco_unity = no

    # Close the IKE_SA if setup of the CHILD_SA along with IKE_AUTH failed.
     close_ike_on_child_failure = yes

    # Number of half-open IKE_SAs that activate the cookie mechanism.
    # cookie_threshold = 10

    # Delete CHILD_SAs right after they got successfully rekeyed (IKEv1 only).
    # delete_rekeyed = no

    # Delay in seconds until inbound IPsec SAs are deleted after rekeyings
    # (IKEv2 only).
    # delete_rekeyed_delay = 10

    # Use ANSI X9.42 DH exponent size or optimum size matched to cryptographic
    # strength.
    # dh_exponent_ansi_x9_42 = yes

    # Use RTLD_NOW with dlopen when loading plugins and IMV/IMCs to reveal
    # missing symbols immediately.
    # dlopen_use_rtld_now = no

    # DNS server assigned to peer via configuration payload (CP).
    # dns1 =

    # DNS server assigned to peer via configuration payload (CP).
    # dns2 =

    # Enable Denial of Service protection using cookies and aggressiveness
    # checks.
    # dos_protection = yes

    # Compliance with the errata for RFC 4753.
    # ecp_x_coordinate_only = yes

    # Free objects during authentication (might conflict with plugins).
    # flush_auth_cfg = no

    # Whether to follow IKEv2 redirects (RFC 5685).
    # follow_redirects = yes

    # Maximum size (complete IP datagram size in bytes) of a sent IKE fragment
    # when using proprietary IKEv1 or standardized IKEv2 fragmentation, defaults
    # to 1280 (use 0 for address family specific default values, which uses a
    # lower value for IPv4).  If specified this limit is used for both IPv4 and
    # IPv6.
    # fragment_size = 1280

    # Name of the group the daemon changes to after startup.
    # group =

    # Timeout in seconds for connecting IKE_SAs (also see IKE_SA_INIT DROPPING).
     half_open_timeout = 5

    # Enable hash and URL support.
    # hash_and_url = no

    # Allow IKEv1 Aggressive Mode with pre-shared keys as responder.
    # i_dont_care_about_security_and_use_aggressive_mode_psk = no

    # Whether to ignore the traffic selectors from the kernel's acquire events
    # for IKEv2 connections (they are not used for IKEv1).
    # ignore_acquire_ts = no

    # A space-separated list of routing tables to be excluded from route
    # lookups.
    # ignore_routing_tables =

    # Maximum number of IKE_SAs that can be established at the same time before
    # new connection attempts are blocked.
    # ikesa_limit = 0

    # Number of exclusively locked segments in the hash table.
    # ikesa_table_segments = 1

    # Size of the IKE_SA hash table.
    # ikesa_table_size = 1

    # Whether to close IKE_SA if the only CHILD_SA closed due to inactivity.
     inactivity_close_ike = yes

    # Limit new connections based on the current number of half open IKE_SAs,
    # see IKE_SA_INIT DROPPING in strongswan.conf(5).
    # init_limit_half_open = 0

    # Limit new connections based on the number of queued jobs.
    # init_limit_job_load = 0

    # Causes charon daemon to ignore IKE initiation requests.
    # initiator_only = no

    # Install routes into a separate routing table for established IPsec
    # tunnels.
    # install_routes = yes

    # Install virtual IP addresses.
    # install_virtual_ip = yes

    # The name of the interface on which virtual IP addresses should be
    # installed.
    # install_virtual_ip_on =

    # Check daemon, libstrongswan and plugin integrity at startup.
    # integrity_test = no

    # A comma-separated list of network interfaces that should be ignored, if
    # interfaces_use is specified this option has no effect.
    # interfaces_ignore =

    # A comma-separated list of network interfaces that should be used by
    # charon. All other interfaces are ignored.
    # interfaces_use =

    # NAT keep alive interval.
     keep_alive = 25s

    # Plugins to load in the IKE daemon charon.
    # load =

    # Determine plugins to load via each plugin's load option.
    # load_modular = no

    # Initiate IKEv2 reauthentication with a make-before-break scheme.
    # make_before_break = yes

    # Maximum number of IKEv1 phase 2 exchanges per IKE_SA to keep state about
    # and track concurrently.
    # max_ikev1_exchanges = 3

    # Maximum packet size accepted by charon.
    # max_packet = 10000

    # Enable multiple authentication exchanges (RFC 4739).
    # multiple_authentication = yes

    # WINS servers assigned to peer via configuration payload (CP).
    # nbns1 =

    # WINS servers assigned to peer via configuration payload (CP).
    # nbns2 =

    # UDP port used locally. If set to 0 a random port will be allocated.
    # port = 500

    # UDP port used locally in case of NAT-T. If set to 0 a random port will be
    # allocated.  Has to be different from charon.port, otherwise a random port
    # will be allocated.
    # port_nat_t = 4500

    # Whether to prefer updating SAs to the path with the best route.
    # prefer_best_path = no

    # Prefer locally configured proposals for IKE/IPsec over supplied ones as
    # responder (disabling this can avoid keying retries due to
    # INVALID_KE_PAYLOAD notifies).
    # prefer_configured_proposals = yes

    # By default public IPv6 addresses are preferred over temporary ones (RFC
    # 4941), to make connections more stable. Enable this option to reverse
    # this.
    # prefer_temporary_addrs = no

    # Process RTM_NEWROUTE and RTM_DELROUTE events.
    # process_route = yes

    # Delay in ms for receiving packets, to simulate larger RTT.
    # receive_delay = 0

    # Delay request messages.
    # receive_delay_request = yes

    # Delay response messages.
    # receive_delay_response = yes

    # Specific IKEv2 message type to delay, 0 for any.
    # receive_delay_type = 0

    # Size of the AH/ESP replay window, in packets.
    # replay_window = 32

    # Base to use for calculating exponential back off, see IKEv2 RETRANSMISSION
    # in strongswan.conf(5).
    # retransmit_base = 1.8

    # Maximum jitter in percent to apply randomly to calculated retransmission
    # timeout (0 to disable).
    # retransmit_jitter = 0

    # Upper limit in seconds for calculated retransmission timeout (0 to
    # disable).
    # retransmit_limit = 0

    # Timeout in seconds before sending first retransmit.
    # retransmit_timeout = 4.0

    # Number of times to retransmit a packet before giving up.
    # retransmit_tries = 5

    # Interval in seconds to use when retrying to initiate an IKE_SA (e.g. if
    # DNS resolution failed), 0 to disable retries.
    # retry_initiate_interval = 0

    # Initiate CHILD_SA within existing IKE_SAs (always enabled for IKEv1).
     reuse_ikesa = yes

    # Numerical routing table to install routes to.
    # routing_table =

    # Priority of the routing table.
    # routing_table_prio =

    # Whether to use RSA with PSS padding instead of PKCS#1 padding by default.
    # rsa_pss = no

    # Delay in ms for sending packets, to simulate larger RTT.
    # send_delay = 0

    # Delay request messages.
    # send_delay_request = yes

    # Delay response messages.
    # send_delay_response = yes

    # Specific IKEv2 message type to delay, 0 for any.
    # send_delay_type = 0

    # Send strongSwan vendor ID payload
    # send_vendor_id = no

    # Whether to enable Signature Authentication as per RFC 7427.
    # signature_authentication = yes

    # Whether to enable constraints against IKEv2 signature schemes.
    # signature_authentication_constraints = yes

    # The upper limit for SPIs requested from the kernel for IPsec SAs.
    # spi_max = 0xcfffffff

    # The lower limit for SPIs requested from the kernel for IPsec SAs.
    # spi_min = 0xc0000000

    # Number of worker threads in charon.
    # threads = 16

    # Name of the user the daemon changes to after startup.
    # user =

    crypto_test {

        # Benchmark crypto algorithms and order them by efficiency.
        # bench = no

        # Buffer size used for crypto benchmark.
        # bench_size = 1024

        # Number of iterations to test each algorithm.
        # bench_time = 50

        # Test crypto algorithms during registration (requires test vectors
        # provided by the test-vectors plugin).
        # on_add = no

        # Test crypto algorithms on each crypto primitive instantiation.
        # on_create = no

        # Strictly require at least one test vector to enable an algorithm.
        # required = no

        # Whether to test RNG with TRUE quality; requires a lot of entropy.
        # rng_true = no

    }

    host_resolver {

        # Maximum number of concurrent resolver threads (they are terminated if
        # unused).
        # max_threads = 3

        # Minimum number of resolver threads to keep around.
        # min_threads = 0

    }

    leak_detective {

        # Includes source file names and line numbers in leak detective output.
        # detailed = yes

        # Threshold in bytes for leaks to be reported (0 to report all).
        # usage_threshold = 10240

        # Threshold in number of allocations for leaks to be reported (0 to
        # report all).
        # usage_threshold_count = 0

    }

    processor {

        # Section to configure the number of reserved threads per priority class
        # see JOB PRIORITY MANAGEMENT in strongswan.conf(5).
        priority_threads {

        }

    }

    # Section containing a list of scripts (name = path) that are executed when
    # the daemon is started.
    start-scripts {

    }

    # Section containing a list of scripts (name = path) that are executed when
    # the daemon is terminated.
    stop-scripts {

    }

    tls {

        # List of TLS encryption ciphers.
        # cipher =

        # List of TLS key exchange methods.
        # key_exchange =

        # List of TLS MAC algorithms.
        # mac =

        # List of TLS cipher suites.
        # suites =

    }

    x509 {

        # Discard certificates with unsupported or unknown critical extensions.
        # enforce_critical = yes

    }

}

In addition to this, I've also changed my mobileconfig to change the <key>LifeTimeInMinutes</key> from <integer>20</integer> to <integer>1440</integer> in both places where the field appears.

@digeratus Maybe you want to test this config out? I'd like to know if this config plays well with non-Apple clients.

@digeratus
Copy link

@TC1977 I just got back. Will test today

@TC1977
Copy link
Contributor

TC1977 commented Dec 3, 2018

Still no further "policy failed" errors after another week, running with rekey=yes, uniqueids=yes, and increased times on the iOS mobileconfig side. The received unencrypted informational: messages are still coming through occasionally but don't cause any orphaned policies.

The changes are available at TC1977/algo, with one exception. The /etc/strongswan.d/charon.conf setting changes aren't included because I have no idea how to create that file and refer to it during installation. (I really don't know what I'm doing here.)

ubuntu@ip-172-16-254-163:~$ journalctl -u strongswan|grep failed
Nov 18 21:24:55 ip-172-16-254-163 charon[871]: 10[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov 18 21:24:58 ip-172-16-254-163 charon[871]: 06[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov 18 21:25:01 ip-172-16-254-163 charon[871]: 08[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov 18 21:25:04 ip-172-16-254-163 charon[871]: 14[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov 18 21:25:08 ip-172-16-254-163 ipsec[782]: 10[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov 18 21:25:08 ip-172-16-254-163 ipsec[782]: 06[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov 18 21:25:08 ip-172-16-254-163 ipsec[782]: 08[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov 18 21:25:08 ip-172-16-254-163 ipsec[782]: 14[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov 25 17:00:46 ip-172-16-254-163 charon[871]: 05[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov 26 09:21:36 ip-172-16-254-163 ipsec[782]: 05[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov 29 09:30:23 ip-172-16-254-163 charon[871]: 15[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov 29 09:31:30 ip-172-16-254-163 ipsec[782]: 15[IKE] INFORMATIONAL request with message ID 0 processing failed
ubuntu@ip-172-16-254-163:~$ journalctl -u strongswan|grep unable

@digeratus
Copy link

@TC1977 Seems like things are ok on this front. Has anyone else tested?

@dguido dguido closed this as completed Feb 17, 2019
This was referenced Mar 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants