Standby Cluster member causing performance issues

Terry · November 2021

Hry everyone,

We have a pair of watchguard firewalls in active/standby mode. We are trying to figure out a strange behavior. We made some changes to the external interface, instead of the having our colo connection (2 of them) go into two switches and then have the each watchguard plug into the the switch so it was one internet connection into one switch and one watchguard into that swicth as well. That worked great for 10 years, but we figured it was time to replace the switches so we plugged the internet connections into the the firewalls directly a one to one relationship. For about an hour that seemed to work, until the packets were being dropped. Our internal network servers could connect to internet sites, but then could not, it got worse when we did a failover to the second unit thinking the master was the problem . so we failed back and then unplugged the second unit and everything started to work normally again. Any ideas what may cause this.

One last thing the trusted interface we moved to the newer switches that we had for about 5 years. I just do not know where to start the logs looked normal, no werrors on the switches.

Bruce_Briggs · November 2021

What is the concept of the 2 colo connections? Redundancy?

Most likely it is different than the result when both firewalls are connected to the same internet connection via a switch, and thus is the issue.

Terry · November 2021

Yes, the two connections are for redundancy as those connections are connected to two separate switches of the Colo provider. the connections are transparent to the firewalls. I suspect the second firewall is trying to take over as master and causing the issues. We are going to continue to do some more testing and we are looking at putting the connections back through our new switches. Any other suggestion I am open to look at.

DaveDave · November 2021

Hi Terry, Not sure how you are going with this.

I'm stretching my brain as I hit something similar to this many years ago. I forget the specifics but in my instance it was something to do with either BDPU filtering or gratuitous ARP. It was affecting the clusters upstream performance in bizarre and significant ways.

Internal traffic was fine, but the moment it went external I had nothing but problems.

I had another situation with Cisco gear that the ISP would configure active/backup failover, but the Cisco gear would only failover when it detected that the Port Link Status itself had gone down. With Watchguard's in passive mode, they never go fully down and still have an active link hence never triggering a failover.

Do you have a diagram, or could you provide more info about the upstream topology, devices, upstream port config etc?

Sorry without this we may be providing irrelevant info.

Dave.

Terry · November 2021

We figured out the issue, it was the two connections from the colo could not see each other and once we plugged the colo and the watchguards into the switch and enabled the vlan across the two switches, everything started working. Quite an experience from testing.

Standby Cluster member causing performance issues

Comments

Categories