FireCluster Failover

ARamsey · August 2021

We are working on a HA configuration for our WatchGuard FWs. We use a highly redundant infrastructure model We use stacked switch stacks. All connections that aren't clients are LAGGED across switches, etc. So, if we are doing maintenance on a switch and it needs rebooted we don't have anything failing over. However, what happens if we have a LAG on our WatchGuards stretched across two switches and reboot one. Does the one link in the LAG going down actually cause a failover?

james.carson · August 2021

Depending on how the interfaces are configured (there's a checkbox to monitor or not monitor the interface) they may or may not be.

The cluster system looks at a score (WAI or Weighted Average Index) which is a average of 4 scores. One of these is tied to firewall ports being up.

If one member of a LAG goes down, the firewall will compare the WAI score to that of the other (backup) firewall. If the backup firewall has a higher score, the system will fail over to that member.

If both cluster member's LAGs are plugged into the same switch, and that switch goes down, nothing should occur. If only one cluster member looses an interface, and the other does not, that one takes over.

It's also worth noting that the cluster failover should be transparent to the users, and no capabilities are lost switching from one to the other. If a failover does occur the notification that it happened, and any associated VPNs that were online quickly rebuilding would be the only indication.

ARamsey · August 2021

Thank you for your comment. So, if I "stretch" the LAG across 2 switches and the primary and secondary both loose a link in a LAG the FireCluster should not failover in that case. Thank you for the information.

james.carson · August 2021

@ARamsey
It shouldn't -provided- that the switches replicate the same traffic between switches in your stack. I'd suggest testing failovers if you intend on doing this to ensure that it's the case.

If you fail-over in that configuration and no traffic flows until you fail back, your switch may not be forwarding all traffic.

(over simplified explanation: The clustered firewalls share the same MAC address for each port, and the master is just the one that's allowed to send reply traffic at that given time. Having the same MAC present on two different switches can sometimes cause issues, and cause the switch to not forward traffic to BOTH of the firewalls across multiple switches.)

FireCluster Failover

Comments