Message retry timeout. Check the connection between local and remote gateway endpoints.

We have a main office and 4 branch offices. The main office (M200) has BOVPNs to all four branch offices. Two of the branch offices (Site 2 and Site 3) have a BOVPN between each other as well. Last night the tunnel between Site 2 and Site 3 went down and I'm getting the error on both sides: Message retry timeout. Check the connection between local and remote gateway endpoints.

The tunnels from the Main office to Site 2 and Site 3 never went down and remain up. Can't figure out why I can't get the BOVPN between S2 and S3. No changes were made to any configurations. I attempted deleting the BOVPN between S2 and S3 and re-creating, but I still get the same error.

I can post the diagnostics in a reply as they are too long to post in here.

Any help on this would be greatly appreciated!

Comments

  • Site 2 Side Diagnostic:

    *** WG Diagnostic Report for Gateway "S3" ***
    Created On: Thu Apr 15 11:37:09 2021

    [Conclusion]
    Error Messages for Gateway Endpoint #1(name "S3")
    Apr 15 11:37:03 2021 ERROR 0x02030015 Message retry timeout. Check the connection between local and remote gateway endpoints.

    [Gateway Summary]
    Gateway "S3" contains "1" gateway endpoint(s). IKE Version is IKEv1.
    Gateway Endpoint #1 (name "S3") Enabled
    Mode: Main
    PFS: Disabled AlwaysUp: Disabled
    DPD: Enabled Keepalive: Disabled
    Local ID<->Remote ID: {IP_ADDR(S2_IP_ADDRESS) <-> IP_ADDR(S3_IP_ADDRESS)}
    Local GW_IP<->Remote GW_IP: {S2_IP_ADDRESS <-> S3_IP_ADDRESS}
    Outgoing Interface: eth0 (ifIndex=3)
    ifMark=0x10000
    linkStatus=0 (0:unknown, 1:down, 2:up)
    Stored user messages:
    Apr 15 11:37:03 2021 ERROR 0x02030015 Message retry timeout. Check the connection between local and remote gateway endpoints.

    [Tunnel Summary]
    "1" tunnel(s) are found using the previous gateway

      Name: "ToS3" Enabled
        PFS: "Enabled" DH-Group: "14"
        Number of Proposals: "1"
          Proposal "ESP-AES256-SHA256"
            ESP:
              EncryptAlgo: "AES" KeyLen: "32(bytes)"
              AuthAlgo: "SHA2-256" 
              LifeTime: "28800(seconds)" LifeByte: "0(kbytes)"
        Number of Tunnel Routes: "1"
            #1
              Direction: "BOTH"
              "S2_LOCAL_IP_ADDRESS/24<->S3_LOCAL_IP_ADDRESS/24"
    

    [Run-time Info (gateway IKE_SA)]

    [Run-time Info (tunnel IPSEC_SA)]
    "0" IPSEC SA(s) are found under tunnel "ToS3"

    [Run-time Info (tunnel IPSEC_SP)]
    "1" IPSEC SP(s) are found under tunnel "ToS3"
    #1
    Tunnel Endpoint: "S2_IP_ADDRESS->S3_IP_ADDRESS"
    Tunnel Selector: S2_LOCAL_IP_ADDRESS/24 -> S3_LOCAL_IP_ADDRESS/24 Proto: ANY
    Created On: Thu Apr 15 11:02:44 2021
    Gateway Name: "S3"
    Tunnel Name: "ToS3"

    [Address Pairs in Firewalld]
    Address Pairs for tunnel "ToS3"
    Direction: BOTH
    S2_LOCAL_IP_ADDRESS/24 <-> S3_LOCAL_IP_ADDRESS/24

    [Policy checker result]
    Tunnel name: ToS3
    #1 tunnel route S2_LOCAL_IP_ADDRESS/24<->S3_LOCAL_IP_ADDRESS/24
    No policy checker results for this tunnel(no P2SA found or some other error)

    [Related Logs]
    <158>Apr 15 11:36:50 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)******** RECV an IKE packet at S2_IP_ADDRESS:500(socket=14 ifIndex=3) from Peer S3_IP_ADDRESS:500 ********
    <158>Apr 15 11:36:51 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)Resending phase-1 message to S3_IP_ADDRESS:500. Gateway-Endpoint:S3 p1saId:0x0
    <158>Apr 15 11:36:51 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)Resending phase-1 message to S3_IP_ADDRESS. Gateway-Endpoint:S3 p1saId:0x0
    <158>Apr 15 11:36:54 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)******** RECV an IKE packet at S2_IP_ADDRESS:500(socket=14 ifIndex=3) from Peer S3_IP_ADDRESS:500 ********
    <158>Apr 15 11:36:55 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)Resending phase-1 message to S3_IP_ADDRESS. Gateway-Endpoint:S3 p1saId:0x0
    <158>Apr 15 11:36:55 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)Resending phase-1 message to S3_IP_ADDRESS:500. Gateway-Endpoint:S3 p1saId:0x0
    <158>Apr 15 11:36:58 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)******** RECV an IKE packet at S2_IP_ADDRESS:500(socket=14 ifIndex=3) from Peer S3_IP_ADDRESS:500 ********
    <158>Apr 15 11:36:59 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)Resending phase-1 message to S3_IP_ADDRESS:500. Gateway-Endpoint:S3 p1saId:0x0
    <155>Apr 15 11:36:59 iked[1766]: msg_id="0203-0015" (S2_IP_ADDRESS<->S3_IP_ADDRESS)IKE phase-1 negotiation from S2_IP_ADDRESS:500 to S3_IP_ADDRESS failed. Gateway-Endpoint='S3' Reason=Message retry timeout. Check the connection between local and remote gateway endpoints.
    <158>Apr 15 11:36:59 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)ike_p1_status_chg: ikePcyName=S3, status=DOWN
    <158>Apr 15 11:36:59 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)MWAN-Failover notify ikePcy=0xb90fb8(S3 ver#1), mwanFlags:0x00000000 p1said=0x0 DOWN continuous-fails:110
    <158>Apr 15 11:36:59 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)WAN-Failover: start "AlwaysUp" timer(expires in 20s) for ikePcy(S3)
    <158>Apr 15 11:36:59 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)IkeDeleteIsakmpSA: try to delete Isakmp SA 0x8b33a8 for Gateway S3. State:3
    <158>Apr 15 11:36:59 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)IkeDeleteIsakmpSA: try to delete QMState SA 0x8b79f0 for Gateway S3
    <158>Apr 15 11:36:59 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)IkeDeleteQMState: try to delete QMState 0x8b79f0 (ID 0) with IsakmpSA(0x8b33a8) Gateway(S3)
    <158>Apr 15 11:36:59 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)SA Nego Fail: saHandle 0x0xbb5048 InitMode 1, reason 2
    <158>Apr 15 11:36:59 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)SA Nego Fail: free saHandle, ipsecPcy("ToS3")
    <158>Apr 15 11:36:59 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)Totally 1 Pending P2 SA Requests Got Dropped.
    <158>Apr 15 11:36:59 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)IkeDeleteIsakmpSA: Stop Phase One Retry and Life Timer
    <158>Apr 15 11:36:59 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)IkeDeleteIsakmpSA: Stop Phase One DPD Retry timer
    <158>Apr 15 11:36:59 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)ikeSADeleteFromCookieHashTable: IKE SA event: Delete IsakmpSA(0x8b33a8) in IkeIsakmpSATable[82],pPrev((nil)) pNext((nil)) ikePcy(S3) Cookies(i=e4d3a22f15d0fbe7 r=0000000000000000)
    <158>Apr 15 11:36:59 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)IkeDeleteIsakmpSA: reclaim isakmpSA(0x8b33a8)'s memory and mark it as "FREED"
    <155>Apr 15 11:37:03 iked[1766]: msg_id="0203-0015" (S2_IP_ADDRESS<->S3_IP_ADDRESS)IKE phase-1 negotiation from S2_IP_ADDRESS:500 to S3_IP_ADDRESS:500 failed. Gateway-Endpoint='S3' Reason=Message retry timeout. Check the connection between local and remote gateway endpoints.
    <158>Apr 15 11:37:03 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)ike_p1_status_chg: ikePcyName=S3, status=DOWN
    <158>Apr 15 11:37:03 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)MWAN-Failover notify ikePcy=0xb90fb8(S3 ver#1), mwanFlags:0x00000000 p1said=0x0 DOWN continuous-fails:111
    <158>Apr 15 11:37:03 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)IkeDeleteIsakmpSA: try to delete Isakmp SA 0x8ae5c8 for Gateway S3. State:5
    <158>Apr 15 11:37:03 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)Totally 0 Pending P2 SA Requests Got Dropped.
    <158>Apr 15 11:37:03 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)IkeDeleteIsakmpSA: Stop Phase One Retry and Life Timer
    <158>Apr 15 11:37:03 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)IkeDeleteIsakmpSA: Stop Phase One DPD Retry timer
    <158>Apr 15 11:37:03 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)ikeSADeleteFromCookieHashTable: IKE SA event: Delete IsakmpSA(0x8ae5c8) in IkeIsakmpSATable[242],pPrev((nil)) pNext((nil)) ikePcy(S3) Cookies(i=3ff82523949702e3 r=98a30467932b57a4)

    <158>Apr 15 11:37:03 iked[1766]: (S2_IP_ADDRESS<->S3_IP_ADDRESS)IkeDeleteIsakmpSA: reclaim isakmpSA(0x8ae5c8)'s memory and mark it as "FREED"

  • Site 3 Side Diagnostic :

    *** WG Diagnostic Report for Gateway "S2" ***
    Created On: Thu Apr 15 10:04:04 2021

    [Conclusion]
    Error Messages for Gateway Endpoint #1(name "S2")
    Apr 15 10:04:00 2021 ERROR 0x02030015 Message retry timeout. Check the connection between local and remote gateway endpoints.

    [Gateway Summary]
    Gateway "S2" contains "1" gateway endpoint(s). IKE Version is IKEv1.
    Gateway Endpoint #1 (name "S2") Enabled
    Mode: Main
    PFS: Disabled AlwaysUp: Disabled
    DPD: Enabled Keepalive: Disabled
    Local ID<->Remote ID: {IP_ADDR(S3_IP_ADDRESS) <-> IP_ADDR(S2_IP_ADDRESS)}
    Local GW_IP<->Remote GW_IP: {S3_IP_ADDRESS <-> S2_IP_ADDRESS}
    Outgoing Interface: eth0 (ifIndex=4)
    ifMark=0x10000
    linkStatus=0 (0:unknown, 1:down, 2:up)
    Stored user messages:
    Apr 15 10:04:00 2021 ERROR 0x02030015 Message retry timeout. Check the connection between local and remote gateway endpoints.

    [Tunnel Summary]
    "1" tunnel(s) are found using the previous gateway

      Name: "ToS2" Enabled
        PFS: "Enabled" DH-Group: "14"
        Number of Proposals: "1"
          Proposal "ESP-AES256-SHA256"
            ESP:
              EncryptAlgo: "AES" KeyLen: "32(bytes)"
              AuthAlgo: "SHA2-256" 
              LifeTime: "28800(seconds)" LifeByte: "0(kbytes)"
        Number of Tunnel Routes: "1"
            #1
              Direction: "BOTH"
              "*S3_LOCAL_IP_ADDRESS*/24<->*S2_LOCAL_IP_ADDRESS*/24"
    

    [Run-time Info (gateway IKE_SA)]

    [Run-time Info (tunnel IPSEC_SA)]
    "0" IPSEC SA(s) are found under tunnel "ToS2"

    [Run-time Info (tunnel IPSEC_SP)]
    "1" IPSEC SP(s) are found under tunnel "ToS2"
    #1
    Tunnel Endpoint: "S3_IP_ADDRESS->S2_IP_ADDRESS"
    Tunnel Selector: S3_LOCAL_IP_ADDRESS/24 -> S2_LOCAL_IP_ADDRESS/24 Proto: ANY
    Created On: Thu Apr 15 10:02:51 2021
    Gateway Name: "S2"
    Tunnel Name: "ToS2"

    [Address Pairs in Firewalld]
    Address Pairs for tunnel "ToS2"
    Direction: BOTH
    S3_LOCAL_IP_ADDRESS/24 <-> S2_LOCAL_IP_ADDRESS/24

    [Policy checker result]
    Tunnel name: ToS2
    #1 tunnel route S3_LOCAL_IP_ADDRESS/24<->S2_LOCAL_IP_ADDRESS/24
    No policy checker results for this tunnel(no P2SA found or some other error)

    [Related Logs]
    <158>Apr 15 10:03:47 iked[2686]: alwaysUpTimerCb trigger autoStart for ikePcy(S2) ipsecPcy(ToS2)
    <158>Apr 15 10:03:47 iked[2686]: AUTOSTART: RECV ipecPcy(ToS2), ikePcy(S2), ifIndex(4), tunnel_src=S3_IP_ADDRESS, tunnel_dst=S2_IP_ADDRESS
    <158>Apr 15 10:03:47 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)do the ACQUIRE action for the tunnel route [src:S3_LOCAL_IP_ADDRESS/24 <-> dst:S2_LOCAL_IP_ADDRESS/24], ike_ver=1, peer_udp_port=0
    <158>Apr 15 10:03:47 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)(NATT)IkeFindIsakmpSABySPD: Matched IP and peer_udp_port=0 p1saId=0 : pIsakmpSA p1saID=0 DestPort=0
    <158>Apr 15 10:03:47 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)(NATT)IkeFindIsakmpSABySPD: Matched IP and peer_udp_port=0 p1saId=0 : pIsakmpSA p1saID=0 DestPort=0
    <158>Apr 15 10:03:47 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)StartNegotiation: P1 negotiation is still going on... Increment Pending P2SA counter 1 (Gateway-Endpoint S2)
    <158>Apr 15 10:03:47 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)(StartNego) maxPendingP2SARequest 128 current 1
    <158>Apr 15 10:03:48 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)Resending phase-1 message to S2_IP_ADDRESS. Gateway-Endpoint:S2 p1saId:0x0
    <158>Apr 15 10:03:52 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)Resending phase-1 message to S2_IP_ADDRESS. Gateway-Endpoint:S2 p1saId:0x0
    <158>Apr 15 10:03:57 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)Resending phase-1 message to S2_IP_ADDRESS. Gateway-Endpoint:S2 p1saId:0x0
    <158>Apr 15 10:03:58 iked[2686]: *******recv IPSEC_ACQUIRE message, trying to trigger the tunnel negotiation for gateway(S2), tunnel(ToS2), ifindex(4), ifNum(0)
    <158>Apr 15 10:03:58 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)do the ACQUIRE action for the tunnel route [src:S3_LOCAL_IP_ADDRESS/24 <-> dst:S2_LOCAL_IP_ADDRESS/24], ike_ver=1, peer_udp_port=0
    <158>Apr 15 10:03:58 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)(NATT)IkeFindIsakmpSABySPD: Matched IP and peer_udp_port=0 p1saId=0 : pIsakmpSA p1saID=0 DestPort=0
    <158>Apr 15 10:03:58 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)(NATT)IkeFindIsakmpSABySPD: Matched IP and peer_udp_port=0 p1saId=0 : pIsakmpSA p1saID=0 DestPort=0
    <158>Apr 15 10:03:58 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)StartNegotiation: Already in process of QM negotiation
    <158>Apr 15 10:03:58 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)SA Nego Fail: saHandle 0x0x2c7ff218 InitMode 1, reason 2
    <158>Apr 15 10:03:58 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)SA Nego Fail: free saHandle, ipsecPcy("ToS2")
    <155>Apr 15 10:03:58 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)ikeDoXfrmAcquireAction: IkeStartNegotiation failed - Err=-1 Gateway(S2)
    <155>Apr 15 10:04:00 iked[2686]: msg_id="0203-0015" (S3_IP_ADDRESS<->S2_IP_ADDRESS)IKE phase-1 negotiation from S3_IP_ADDRESS:500 to S2_IP_ADDRESS failed. Gateway-Endpoint='S2' Reason=Message retry timeout. Check the connection between local and remote gateway endpoints.
    <158>Apr 15 10:04:00 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)ike_p1_status_chg: ikePcyName=S2, status=DOWN
    <158>Apr 15 10:04:00 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)MWAN-Failover notify ikePcy=0x2c811b28(S2 ver#1), mwanFlags:0x00000000 p1said=0x0 DOWN continuous-fails:2
    <158>Apr 15 10:04:00 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)WAN-Failover: start "AlwaysUp" timer(expires in 20s) for ikePcy(S2)
    <158>Apr 15 10:04:00 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)IkeDeleteIsakmpSA: try to delete Isakmp SA 0x6175d0 for Gateway S2. State:3
    <158>Apr 15 10:04:00 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)IkeDeleteIsakmpSA: try to delete QMState SA 0x6578a8 for Gateway S2
    <158>Apr 15 10:04:00 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)IkeDeleteQMState: deleting QMState 0x6578a8 (ID 0 state:240) with IsakmpSA(0x6175d0) Gateway(S2)
    <158>Apr 15 10:04:00 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)SA Nego Fail: saHandle 0x0x2c629bd8 InitMode 1, reason 2
    <158>Apr 15 10:04:00 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)SA Nego Fail: free saHandle, ipsecPcy("ToS2")
    <158>Apr 15 10:04:00 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)Totally 1 Pending P2 SA Requests Got Dropped.
    <158>Apr 15 10:04:00 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)IkeDeleteIsakmpSA: Stop Phase One Retry and Life Timer
    <158>Apr 15 10:04:00 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)IkeDeleteIsakmpSA: Stop Phase One DPD Retry timer
    <158>Apr 15 10:04:00 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)ikeSADeleteFromCookieHashTable: IKE SA event: Delete IsakmpSA(0x6175d0) in IkeIsakmpSATable[12],pPrev((nil)) pNext((nil)) ikePcy(S2) Cookies(i=35a2a12f84ce2fe1 r=0000000000000000)

    <158>Apr 15 10:04:00 iked[2686]: (S3_IP_ADDRESS<->S2_IP_ADDRESS)IkeDeleteIsakmpSA: reclaim isakmpSA(0x6175d0)'s memory and mark it as "FREED"

  • For the record, what firewall models & software versions are at each end?

    It looks like both ends are out of sync on trying to bring up the VPN.

    In this post, a change to the "IKE V1 password", which I assume to mean the shared key fixed the issue. No idea why.

    https://community.watchguard.com/watchguard-community/discussion/1683/trying-to-track-issue-down

Sign In to comment.