Radius Authentication issue
I am having a weird issue and was wondering if someone has seen this before....
I am using a hybrid environment with Microsoft, so my AD is replicated with azure. I am rolling out MFA using the Microsoft Authentication app. I have followed the article creating a new NPS/Radius server and the Azure plugin. I attempt an SSLVPN connection and the app works as it should and I can approve the login. This all works as it should until 4 days later. After the 4 day window, the NPS server no longer accepts responses from the Watchguard and I see the following error in Event Viewer "A malformed RADIUS message was received from client Watchguard. The data is the RADIUS message". I have been around and around trying to troubleshoot, but always the same and always after 4 days. If I simply stop and start the NPS service, things begin to work again for the 4 days. I have no scheduled tasks that run and the firewall is off on that server for all connections (been down the windows firewall threads as issue).
Comments
@Phil I've not personally run into this issue.
Searching through previous issues customers have run into with that same log line suggests a few customers whom had network discovery enabled on the firebox.
(Network Discovery is a feature in the Fireware WebUI that runs network scans to help identify clients, which includes port scans. See https://www.watchguard.com/help/docs/help-center/en-US/Content/en-US/Fireware/services/network_discovery/network_discovery_web.html )
If that feature is currently turned on, it might be worth turning it off for a few days to see if that port scan was interfering with the NPS server.
-James Carson
WatchGuard Customer Support
Now that is something that never would have popped into my head to check. I do have network discovery on and it takes a long time to complete before starting again so it may fit that timeline. I'll stop discovery for a while and see what happens.......only downside is that it will take 4 or 5 days to confirm
Thanks, it gives me something to try!
@Phil network discovery is a very low priority task - anything else running on the firewall takes precedence over it. If you have anything over the smallest network it'd likely take a few days to get around to that portion of the test.
With that said, a port scan shouldn't be crashing a RADIUS server, which if that is the problem is effectively what is happening.
-James Carson
WatchGuard Customer Support
Well, I will let it sit until Monday and then check the event logs on the NPS server. A simple restart of the NPS service gets things back up and running, so it is not a crash per se. It appears that once the NPS server gets this malformed radius message it simply stops accepting requests made by the watchguard. I get the same error message exactly 8 times in rapid succession (in about 6 or 7 seconds) every 4 days.
This is looking good! I checked with NPS server this morning and no Event #15. I was able to do an MFA verification using Radius and SSLVPN with no issues. I had been pulling my hair out for a couple weeks trying to determine what was causing this problem. Thanks James for the quick reply! Keeping fingers crossed this continues to work for remainder of week. I am anxious to roll out MFA for everyone.