Poor BOVPN Performance - Packet Loss
We recently replaced a Cisco ASA at our main office with a Firebox, but we're experiencing slightly poorer performance with it with respect to packet loss over our IPsec Branch Office VPNs. Using statistics we gather 24/7 on ping responses sent to each site, before, most of our sites didn't respond to around 0-1% of pings sent to them, whereas now, most of our sites aren't responding to around 2-4% of pings (with one affected site in particular not responding to 7-9% of them now). The times for successful pings are unchanged, though, compared to before, and therefore it seems like the problem is probably just packet loss, for some reason. This is causing various little issues, like with things occasionally timing out, in particular at the one site that's being affected more than the others.
In an effort to try and fix this I've experimented with modifying the following settings: Disabling "Enable TCP SYN packet and connection state verification", enabling "TCP MTU Probing", setting "TCP maximum segment size control" to lower values or to "No adjustment", setting the BOVPN's minimum MTU to lower values, and setting the external interface's MTU to lower values, but nothing seemed to make any difference at all. We don't have any QoS or Traffic Management settings set, for the record.
To be honest, for all I know this may not be restricted to just BOVPN's; we keep these historical statistics for our branch office connections, but I have no prior statistics regarding our connection to the internet's performance. In other words, for all I know it might be our connection to the internet itself that's now performing slightly poorer than it was before, not just BOVPN's. If you all happen to have any clue why this might be happening, I sure would appreciate it.
Thank you for your help, I opened a case with Watchguard and this is now resolved. The short answer is, it turns out there was nothing wrong with the Firebox at all. Whoops0
Consider opening a support case on this, to get help from a WG rep.
What exactly are you pinging? Something at the far end of the BOVPN tunnel?
How often are the pings?
You can use ping tools such as PingPlotter not only to see packet loss and ping response times through the tunnel, but also to the external interface of the firewall. By comparing the 2, this can help identify if the issue is an ISP one or only a BOVPN related one.
The default for the Phase 2 Proposals are to rekey the tunnel after 8 hours, and to not rekey based on kilobytes traversing the tunnel.
A rekey will cause a short time where packets will not traverse the tunnel while the tunnel is being rekeyed.
Tunnels will go down if there is no traffic over it for a period of time.
Regular (once per minute) pings across the tunnel should keep it up.
You can turn on diagnostic logging for IKE which may show something if the issue is that the tunnel going down & up:
In WSM Policy Manager: Setup -> Logging -> Diagnostic Log Level -> VPN -> IKE
In the Web UI: System -> Logging -> Settings
Set the slider to Information or higher
Well shoot, this seems really dumb but, because I felt pretty sure the problem wasn't with our overall internet connection (because I assumed it'd be noticeable in that case) I was concentrating on the BOVPN's; but I did what you said and started recording the pings to the external interfaces of a few of our BOVPN'S, and sure enough those are also down by about the same amount. So, it seems that it actually has nothing to do specifically with our BOVPN's at all, and is instead just the firewall in general causing this packet loss. Thanks for the help, I guess that helps narrow it down (or, actually exactly the opposite, really). You wouldn't have any idea why that might be, would you? I can't think of anything we're doing that would cause this, I don't think our situation is really that complicated. Should I re-post this over in the Networking forum?
Could be a speed/duplex mismatch between a firewall interface and whatever it is connected to - perhaps to the ISP device.
Collisions or errors on an interface often indicate a speed/duplex mismatch.
The only place that I know to see these stats is in WatchGuard System Manager -> Firebox System Manager -> Status Report -> Interfaces section.
The firewall interface default for Link speed is Auto, which is the general recommendation.
However, if your ISP device is set to a specific speed/duplex setting, then the firewall interface should be set to the same value.
Perhaps you have info on what the ISP interface is set to.
When there is a mismatch, usually the Auto set interface selects half duplex mode, which will likely cause slow throughput & packet loss, and you would expect to see collisions on a busy interface with a mismatch.
Ok, I tried various speed/duplex combinations, the only ones that actually work are Auto Negotiate (which is what it was set to) and 1000 Mbps/Full Duplex, unfortunately changing that didn't have any positive effect on our issue. Are you thinking I should open a support case on this, then?
Can't hurt, may help.
So where was the issue?
Your ISP someplace?
I switched us over to our old equipment for a 24-hour period, and found that we were experiencing the exact same packet loss. After scrutinizing our historical data more closely, I found a trend: Beginning in June and continuing to the present we're experiencing a very slow upward trend in packet loss. It's not much (we're probably averaging between 2 - 2.5% across all locations at this point, generally speaking; slightly less than I thought when I originally posted this), but it's there. Since we just installed the Firebox in August, then, it obviously couldn't be to blame.
As for why, though, I actually have no idea! I'll have to look into it some other time, when I have more free time. It's not really causing much disruption, at least not yet, just some minor occasional annoyances. I mentioned one site in particular experiencing a larger amount of packet loss; upgrading their router firmware actually fixed that issue, for the record