Specific devices becoming unreachable
I have a weird issue that I think might be related to TCP timeouts but I haven't been able to diagnose.
Recently replaced Fortigate firewalls with WG T45 and T25. Servers are in Azure, BOVPNs are setup and operating properly.
Badge reader control panels are set to static IP and connect to port 3001 over BOVPN to Azure server hosting DB for backend.
Randomly several panels will go offline and can't be pinged from the server or local network.
I can however still ping the devices from the WG diagnostic page. Traffic log shows allow on all packets from server.
A reboot to the panels will reconnect them, but it is temporary.
I am at a loss as to what is happening here. Any advice on where to start would be helpful.
0
Sign In to comment.
Answers
Looks like it was this, after digging though logs I found the answer.
https://community.watchguard.com/watchguard-community/discussion/1584/tcp-syn-checking-exception
Hi @AGreen
If you're seeing these internally, it usually means that the device that you're having a problem with isn't doing a good job at keeping its TCP connections open. (They should be sending TCP Keepalives in order to keep that connection fresh/open in the upstream router's connection table.)
While disabling the TCP syn check box may correct the symptom, it does not do anything to address the actual problem -- which may cause that traffic to get dropped elsewhere in your network.
-James Carson
WatchGuard Customer Support
Hey James,
You are correct that disabling the TCP syn check didn't resolve the problem. At this point I created a custom firewall policy with the required port and the server and panels in the to/from to extend the timeout for just that connection to see if it resolves the issue.
Will likely open a support ticket with the vendor, but was trying to not go that route as our support contract expired.
I am honestly surprised that those panels aren't sending keepalives to maintain the connection.
You can do packet captures using TCP dump on the firewall.
You can specify captures for specific IP addrs etc. using the Advanced Options.
https://www.watchguard.com/help/docs/help-center/en-US/Content/en-US/Fireware/fsm/log_message_learn_more_wsm.html
This should show if keep alives are being sent or not.
I have been looking at pcaps from the WG for the past 3 days. I can't find anything to show why the device becomes unreachable. I see SYN requests for the port to the panel from the server, but no response from the SYN packets and no respone when pinging. Something in the connection is causing the WG to block all traffic to the device until I reboot it, but there's no deny packets, all allow packets.
It's very weird, as I can only ping the device from the WG when this occurs. I have tried creating a custom TCP filter with a longer timeout for TCP connections but it didn't resolve the issue.
At a complete loss as to what is happening.
Still no idea what is happening here. If I bypass the WG and connect directly with a static to the device it responds. PCAPs from the WG don't show anything other than the server sending SYN packets. If I do a PCAP on the devices that are online, I see no traffic, even though I know there is traffic being sent.
Pings to the panels from the local network will go directly via Ethernet and not via the firewall.
This suggests an issue with the panels.
Perhaps a NIC driver needs updating?
@AGreen best I can suggest here is to create a support case so our support team can assist. (I would suggest attaching the pcap you took along with any filters/arguments you used to generate it in the case.)
-James Carson
WatchGuard Customer Support