Message retry timeout. Check the connection between local and remote gateway endpoints.
We have a main office and 4 branch offices. The main office (M200) has BOVPNs to all four branch offices. Two of the branch offices (Site 2 and Site 3) have a BOVPN between each other as well. Last night the tunnel between Site 2 and Site 3 went down and I'm getting the error on both sides: Message retry timeout. Check the connection between local and remote gateway endpoints.
The tunnels from the Main office to Site 2 and Site 3 never went down and remain up. Can't figure out why I can't get the BOVPN between S2 and S3. No changes were made to any configurations. I attempted deleting the BOVPN between S2 and S3 and re-creating, but I still get the same error.
I can post the diagnostics in a reply as they are too long to post in here.
Any help on this would be greatly appreciated!
Comments
Site 2 Side Diagnostic:
*** WG Diagnostic Report for Gateway "S3" ***
Created On: Thu Apr 15 11:37:09 2021
[Conclusion]
Error Messages for Gateway Endpoint #1(name "S3")
Apr 15 11:37:03 2021 ERROR 0x02030015 Message retry timeout. Check the connection between local and remote gateway endpoints.
[Gateway Summary]
Gateway "S3" contains "1" gateway endpoint(s). IKE Version is IKEv1.
Gateway Endpoint #1 (name "S3") Enabled
Mode: Main
PFS: Disabled AlwaysUp: Disabled
DPD: Enabled Keepalive: Disabled
Local ID<->Remote ID: {IP_ADDR(S2_IP_ADDRESS) <-> IP_ADDR(S3_IP_ADDRESS)}
Local GW_IP<->Remote GW_IP: {S2_IP_ADDRESS <-> S3_IP_ADDRESS}
Outgoing Interface: eth0 (ifIndex=3)
ifMark=0x10000
linkStatus=0 (0:unknown, 1:down, 2:up)
Stored user messages:
Apr 15 11:37:03 2021 ERROR 0x02030015 Message retry timeout. Check the connection between local and remote gateway endpoints.
[Tunnel Summary]
"1" tunnel(s) are found using the previous gateway
[Run-time Info (gateway IKE_SA)]
[Run-time Info (tunnel IPSEC_SA)]
"0" IPSEC SA(s) are found under tunnel "ToS3"
[Run-time Info (tunnel IPSEC_SP)]
"1" IPSEC SP(s) are found under tunnel "ToS3"
#1
Tunnel Endpoint: "S2_IP_ADDRESS->S3_IP_ADDRESS"
Tunnel Selector: S2_LOCAL_IP_ADDRESS/24 -> S3_LOCAL_IP_ADDRESS/24 Proto: ANY
Created On: Thu Apr 15 11:02:44 2021
Gateway Name: "S3"
Tunnel Name: "ToS3"
[Address Pairs in Firewalld]
Address Pairs for tunnel "ToS3"
Direction: BOTH
S2_LOCAL_IP_ADDRESS/24 <-> S3_LOCAL_IP_ADDRESS/24
[Policy checker result]
Tunnel name: ToS3
#1 tunnel route S2_LOCAL_IP_ADDRESS/24<->S3_LOCAL_IP_ADDRESS/24
No policy checker results for this tunnel(no P2SA found or some other error)
[Related Logs]
<158>Apr 15 11:36:50 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)******** RECV an IKE packet at S2_IP_ADDRESS:500(socket=14 ifIndex=3) from Peer S3_IP_ADDRESS:500 ********
<158>Apr 15 11:36:51 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)Resending phase-1 message to S3_IP_ADDRESS:500. Gateway-Endpoint:S3 p1saId:0x0
<158>Apr 15 11:36:51 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)Resending phase-1 message to S3_IP_ADDRESS. Gateway-Endpoint:S3 p1saId:0x0
<158>Apr 15 11:36:54 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)******** RECV an IKE packet at S2_IP_ADDRESS:500(socket=14 ifIndex=3) from Peer S3_IP_ADDRESS:500 ********
<158>Apr 15 11:36:55 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)Resending phase-1 message to S3_IP_ADDRESS. Gateway-Endpoint:S3 p1saId:0x0
<158>Apr 15 11:36:55 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)Resending phase-1 message to S3_IP_ADDRESS:500. Gateway-Endpoint:S3 p1saId:0x0
<158>Apr 15 11:36:58 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)******** RECV an IKE packet at S2_IP_ADDRESS:500(socket=14 ifIndex=3) from Peer S3_IP_ADDRESS:500 ********
<158>Apr 15 11:36:59 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)Resending phase-1 message to S3_IP_ADDRESS:500. Gateway-Endpoint:S3 p1saId:0x0
<155>Apr 15 11:36:59 iked[1766]: msg_id="0203-0015" (S2_IP_ADDRESS<->S3_IP_ADDRESS)IKE phase-1 negotiation from S2_IP_ADDRESS:500 to S3_IP_ADDRESS failed. Gateway-Endpoint='S3' Reason=Message retry timeout. Check the connection between local and remote gateway endpoints.
<158>Apr 15 11:36:59 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)ike_p1_status_chg: ikePcyName=S3, status=DOWN
<158>Apr 15 11:36:59 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)MWAN-Failover notify ikePcy=0xb90fb8(S3 ver#1), mwanFlags:0x00000000 p1said=0x0 DOWN continuous-fails:110
<158>Apr 15 11:36:59 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)WAN-Failover: start "AlwaysUp" timer(expires in 20s) for ikePcy(S3)
<158>Apr 15 11:36:59 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)IkeDeleteIsakmpSA: try to delete Isakmp SA 0x8b33a8 for Gateway S3. State:3
<158>Apr 15 11:36:59 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)IkeDeleteIsakmpSA: try to delete QMState SA 0x8b79f0 for Gateway S3
<158>Apr 15 11:36:59 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)IkeDeleteQMState: try to delete QMState 0x8b79f0 (ID 0) with IsakmpSA(0x8b33a8) Gateway(S3)
<158>Apr 15 11:36:59 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)SA Nego Fail: saHandle 0x0xbb5048 InitMode 1, reason 2
<158>Apr 15 11:36:59 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)SA Nego Fail: free saHandle, ipsecPcy("ToS3")
<158>Apr 15 11:36:59 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)Totally 1 Pending P2 SA Requests Got Dropped.
<158>Apr 15 11:36:59 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)IkeDeleteIsakmpSA: Stop Phase One Retry and Life Timer
<158>Apr 15 11:36:59 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)IkeDeleteIsakmpSA: Stop Phase One DPD Retry timer
<158>Apr 15 11:36:59 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)ikeSADeleteFromCookieHashTable: IKE SA event: Delete IsakmpSA(0x8b33a8) in IkeIsakmpSATable[82],pPrev((nil)) pNext((nil)) ikePcy(S3) Cookies(i=e4d3a22f15d0fbe7 r=0000000000000000)
<158>Apr 15 11:36:59 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)IkeDeleteIsakmpSA: reclaim isakmpSA(0x8b33a8)'s memory and mark it as "FREED"
<155>Apr 15 11:37:03 iked[1766]: msg_id="0203-0015" (S2_IP_ADDRESS<->S3_IP_ADDRESS)IKE phase-1 negotiation from S2_IP_ADDRESS:500 to S3_IP_ADDRESS:500 failed. Gateway-Endpoint='S3' Reason=Message retry timeout. Check the connection between local and remote gateway endpoints.
<158>Apr 15 11:37:03 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)ike_p1_status_chg: ikePcyName=S3, status=DOWN
<158>Apr 15 11:37:03 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)MWAN-Failover notify ikePcy=0xb90fb8(S3 ver#1), mwanFlags:0x00000000 p1said=0x0 DOWN continuous-fails:111
<158>Apr 15 11:37:03 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)IkeDeleteIsakmpSA: try to delete Isakmp SA 0x8ae5c8 for Gateway S3. State:5
<158>Apr 15 11:37:03 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)Totally 0 Pending P2 SA Requests Got Dropped.
<158>Apr 15 11:37:03 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)IkeDeleteIsakmpSA: Stop Phase One Retry and Life Timer
<158>Apr 15 11:37:03 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)IkeDeleteIsakmpSA: Stop Phase One DPD Retry timer
<158>Apr 15 11:37:03 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)ikeSADeleteFromCookieHashTable: IKE SA event: Delete IsakmpSA(0x8ae5c8) in IkeIsakmpSATable[242],pPrev((nil)) pNext((nil)) ikePcy(S3) Cookies(i=3ff82523949702e3 r=98a30467932b57a4)
<158>Apr 15 11:37:03 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)IkeDeleteIsakmpSA: reclaim isakmpSA(0x8ae5c8)'s memory and mark it as "FREED"
Site 3 Side Diagnostic :
*** WG Diagnostic Report for Gateway "S2" ***
Created On: Thu Apr 15 10:04:04 2021
[Conclusion]
Error Messages for Gateway Endpoint #1(name "S2")
Apr 15 10:04:00 2021 ERROR 0x02030015 Message retry timeout. Check the connection between local and remote gateway endpoints.
[Gateway Summary]
Gateway "S2" contains "1" gateway endpoint(s). IKE Version is IKEv1.
Gateway Endpoint #1 (name "S2") Enabled
Mode: Main
PFS: Disabled AlwaysUp: Disabled
DPD: Enabled Keepalive: Disabled
Local ID<->Remote ID: {IP_ADDR(S3_IP_ADDRESS) <-> IP_ADDR(S2_IP_ADDRESS)}
Local GW_IP<->Remote GW_IP: {S3_IP_ADDRESS <-> S2_IP_ADDRESS}
Outgoing Interface: eth0 (ifIndex=4)
ifMark=0x10000
linkStatus=0 (0:unknown, 1:down, 2:up)
Stored user messages:
Apr 15 10:04:00 2021 ERROR 0x02030015 Message retry timeout. Check the connection between local and remote gateway endpoints.
[Tunnel Summary]
"1" tunnel(s) are found using the previous gateway
[Run-time Info (gateway IKE_SA)]
[Run-time Info (tunnel IPSEC_SA)]
"0" IPSEC SA(s) are found under tunnel "ToS2"
[Run-time Info (tunnel IPSEC_SP)]
"1" IPSEC SP(s) are found under tunnel "ToS2"
#1
Tunnel Endpoint: "S3_IP_ADDRESS->S2_IP_ADDRESS"
Tunnel Selector: S3_LOCAL_IP_ADDRESS/24 -> S2_LOCAL_IP_ADDRESS/24 Proto: ANY
Created On: Thu Apr 15 10:02:51 2021
Gateway Name: "S2"
Tunnel Name: "ToS2"
[Address Pairs in Firewalld]
Address Pairs for tunnel "ToS2"
Direction: BOTH
S3_LOCAL_IP_ADDRESS/24 <-> S2_LOCAL_IP_ADDRESS/24
[Policy checker result]
Tunnel name: ToS2
#1 tunnel route S3_LOCAL_IP_ADDRESS/24<->S2_LOCAL_IP_ADDRESS/24
No policy checker results for this tunnel(no P2SA found or some other error)
[Related Logs]
<158>Apr 15 10:03:47 iked[2686]: alwaysUpTimerCb trigger autoStart for ikePcy(S2) ipsecPcy(ToS2)
<158>Apr 15 10:03:47 iked[2686]: AUTOSTART: RECV ipecPcy(ToS2), ikePcy(S2), ifIndex(4), tunnel_src=S3_IP_ADDRESS, tunnel_dst=S2_IP_ADDRESS
<158>Apr 15 10:03:47 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)do the ACQUIRE action for the tunnel route [src:S3_LOCAL_IP_ADDRESS/24 <-> dst:S2_LOCAL_IP_ADDRESS/24], ike_ver=1, peer_udp_port=0
<158>Apr 15 10:03:47 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)(NATT)IkeFindIsakmpSABySPD: Matched IP and peer_udp_port=0 p1saId=0 : pIsakmpSA p1saID=0 DestPort=0
<158>Apr 15 10:03:47 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)(NATT)IkeFindIsakmpSABySPD: Matched IP and peer_udp_port=0 p1saId=0 : pIsakmpSA p1saID=0 DestPort=0
<158>Apr 15 10:03:47 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)StartNegotiation: P1 negotiation is still going on... Increment Pending P2SA counter 1 (Gateway-Endpoint S2)
<158>Apr 15 10:03:47 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)(StartNego) maxPendingP2SARequest 128 current 1
<158>Apr 15 10:03:48 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)Resending phase-1 message to S2_IP_ADDRESS. Gateway-Endpoint:S2 p1saId:0x0
<158>Apr 15 10:03:52 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)Resending phase-1 message to S2_IP_ADDRESS. Gateway-Endpoint:S2 p1saId:0x0
<158>Apr 15 10:03:57 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)Resending phase-1 message to S2_IP_ADDRESS. Gateway-Endpoint:S2 p1saId:0x0
<158>Apr 15 10:03:58 iked[2686]: *******recv IPSEC_ACQUIRE message, trying to trigger the tunnel negotiation for gateway(S2), tunnel(ToS2), ifindex(4), ifNum(0)
<158>Apr 15 10:03:58 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)do the ACQUIRE action for the tunnel route [src:S3_LOCAL_IP_ADDRESS/24 <-> dst:S2_LOCAL_IP_ADDRESS/24], ike_ver=1, peer_udp_port=0
<158>Apr 15 10:03:58 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)(NATT)IkeFindIsakmpSABySPD: Matched IP and peer_udp_port=0 p1saId=0 : pIsakmpSA p1saID=0 DestPort=0
<158>Apr 15 10:03:58 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)(NATT)IkeFindIsakmpSABySPD: Matched IP and peer_udp_port=0 p1saId=0 : pIsakmpSA p1saID=0 DestPort=0
<158>Apr 15 10:03:58 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)StartNegotiation: Already in process of QM negotiation
<158>Apr 15 10:03:58 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)SA Nego Fail: saHandle 0x0x2c7ff218 InitMode 1, reason 2
<158>Apr 15 10:03:58 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)SA Nego Fail: free saHandle, ipsecPcy("ToS2")
<155>Apr 15 10:03:58 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)ikeDoXfrmAcquireAction: IkeStartNegotiation failed - Err=-1 Gateway(S2)
<155>Apr 15 10:04:00 iked[2686]: msg_id="0203-0015" (S3_IP_ADDRESS<->S2_IP_ADDRESS)IKE phase-1 negotiation from S3_IP_ADDRESS:500 to S2_IP_ADDRESS failed. Gateway-Endpoint='S2' Reason=Message retry timeout. Check the connection between local and remote gateway endpoints.
<158>Apr 15 10:04:00 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)ike_p1_status_chg: ikePcyName=S2, status=DOWN
<158>Apr 15 10:04:00 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)MWAN-Failover notify ikePcy=0x2c811b28(S2 ver#1), mwanFlags:0x00000000 p1said=0x0 DOWN continuous-fails:2
<158>Apr 15 10:04:00 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)WAN-Failover: start "AlwaysUp" timer(expires in 20s) for ikePcy(S2)
<158>Apr 15 10:04:00 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)IkeDeleteIsakmpSA: try to delete Isakmp SA 0x6175d0 for Gateway S2. State:3
<158>Apr 15 10:04:00 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)IkeDeleteIsakmpSA: try to delete QMState SA 0x6578a8 for Gateway S2
<158>Apr 15 10:04:00 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)IkeDeleteQMState: deleting QMState 0x6578a8 (ID 0 state:240) with IsakmpSA(0x6175d0) Gateway(S2)
<158>Apr 15 10:04:00 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)SA Nego Fail: saHandle 0x0x2c629bd8 InitMode 1, reason 2
<158>Apr 15 10:04:00 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)SA Nego Fail: free saHandle, ipsecPcy("ToS2")
<158>Apr 15 10:04:00 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)Totally 1 Pending P2 SA Requests Got Dropped.
<158>Apr 15 10:04:00 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)IkeDeleteIsakmpSA: Stop Phase One Retry and Life Timer
<158>Apr 15 10:04:00 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)IkeDeleteIsakmpSA: Stop Phase One DPD Retry timer
<158>Apr 15 10:04:00 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)ikeSADeleteFromCookieHashTable: IKE SA event: Delete IsakmpSA(0x6175d0) in IkeIsakmpSATable[12],pPrev((nil)) pNext((nil)) ikePcy(S2) Cookies(i=35a2a12f84ce2fe1 r=0000000000000000)
<158>Apr 15 10:04:00 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)IkeDeleteIsakmpSA: reclaim isakmpSA(0x6175d0)'s memory and mark it as "FREED"
For the record, what firewall models & software versions are at each end?
It looks like both ends are out of sync on trying to bring up the VPN.
In this post, a change to the "IKE V1 password", which I assume to mean the shared key fixed the issue. No idea why.
https://community.watchguard.com/watchguard-community/discussion/1683/trying-to-track-issue-down