Circuit Failure ... Failover too long
Good Morning,
We have a watchguard cluster with High Availability. We have two circuits (two fiber links). Both are active and working.
Yesterday, due to a fiber cut, one of the circuits went down. It took the WatchGuard around 10 minutes to bring the internet back up on the secondary circuit. During the first few minutes, I logged into the WatchGuard and could easily see 100% packet loss on one of the circuits so I knew exactly what was going on. However, I do not understand why it took roughly 10-11 minutes for the failover to take place to get the routing back on the secondary interface. Note: The secondary interface is live. We are using them both actively at the same time.
I have checked all of the settings and they look correct.
Any idea what I could be missing? The probe is set to a 5 second interval with 3 consecutive failures.
Thanks
Comments
What is the destination on your link check? Perhaps it did not show as down when the fiber was cut because your check IP addr is too close to your end.
We often recommend using something a ways up the ISP path, such as the ISP's DNS server.
Under SD-WAN ... the Target is 8.8.8.8
No settings for Link Monitor ???
That is what you need.
Under link monitor ... the Targets are the same for each circuit. (8.8.8.8)
Time for a support incident
I can try that. The WatchGuard was showing 100% packet loss nearly right away so I am not sure why it took so long to failover.