Very weird bug with interfaces reassigning themselves??
Hey all,
We've got a few WG's, have been using them for years with good results.
Last night I was planning on doing a firmware update to one of our M370's. It's on 12.6.2.
It had been running for a few months without a reboot, so before doing the upgrade, I decided to reboot it.
I rebooted and waited, and waited, and waited. It never came back online. We have two WAN's coming into this WG, and neither was functioning.
I was doing all of this from 400 miles away, so got ahold of someone local and asked them to go in. They connected their laptop to their cellphone hotspot and then also ethernet. I was able to get into their laptop via our remote support app.
So I logged into the WG and found it was up and running internally just fine. However our primary internet showed the interface as "down", this was eth5. Our Secondary one showed as "up" on eth4. However it was not receiving any data on this eth4, could not reach the gw from the ISP.
I asked the onsite person to send me a photo of the unit, and when I got it, I was more confused. The photo showed eth4 with no link lights, and eth5 with link lights. This is opposite to what was reported in the WG web interface.
The onsite person powercycled the routers from the ISP's, but that didn't solve anything. We rebooted the WG again, and that didn't solve anything.
So grasping at straws, and a bit of a hunch, I moved the IP info over from the ISP that was was physically connected to eth5, to the eth4 section in the WG, and immediately the connection came up.
So now we have a connection that is physically in Eth5, but configured in software as eth4. All this happened after a reboot, the upgrade still hasn't even been attempted yet.
Anyone ever see anything like this before? it seems like the WG interface count might be shifted somehow.
The only other bit of oddness I can think of, was the reason we were going to do the upgrade in the first place. We've noticed now that staff are returning to the office, that when they were on Zoom/Teams/Meet calls, they'd semi-frequently get a warning their connection wasn't stable, and others reported they would freeze up briefly. This wasn't limited to any one user. In looking into it, we were seeing odd latency spikes, like between two servers which would be sub 1ms normally, then for a few seconds, shoot up to over 2000ms. after some digging we decided we thought it was the WG. The WG is acting as our router between our different VLAN's too. If we bypassed the WG, we didn't see those latency spikes. The CPU and ram were all normal low usage on the WG. We're only talking a couple dozen people in the office at this point, and only using maybe 30Mbps of our 200Mbps bandwidth. The latency is all internal, not internet related at all. That latency is still there after tonight's issues. Maybe the upgraded firmware will resolve that, but I'm not going to test it out at 1am and risk breaking the whole unit again.
Comments
Hi @GreenEnvy22
If an interface completely fails on a firewall and isn't detected when the system starts up, the interfaces may renumber under certain circumstances. The firewall does this while it's booting up, so if it happened during a reboot, that sounds like it may be a potential reason.
This is pretty rare, but does happen from time to time.
Barring damage due to act of nature/act of god ("force majeure," flood, lighting strike, power surge, etc,) a device that fails this way would normally be RMA'ed and swapped with a replacement of the same model.
If your device has an active support contract, I'd suggest contacting support to create a case. You can do so by clicking the support center link at the top right of the page.
If you already have a case open, if you can please reply with the case number, I can go check to ensure that it's with the right team.
It's also worth noting that current from power surges can be carried up ethernet cables into your firewall. Several companies sell surge protectors for ethernet cables that go inline between the ISP equipment and your network and connect to a ground screw (and effectively try to shunt surge current to ground.)
-James Carson
WatchGuard Customer Support
Thanks James, we do have a support contract and will open a ticket soon. We verified our backup Comcast connection also is off by 1 port.
I'm also testing a theory that it's a corrupt config file, so I'm building a new one from scratch that I'll load in from policy manager, mostly just for curiosities sake.
For now we're up and running at least. we probably won't do much for about a week as we'll want to schedule the bit of downtime for doing an RMA.