Process guac-standalone

BarryG · March 2020

We have a Active/Passive cluster with 2 M470's. Baseline CPU use usually doesn't peak over 7%. I've set up a bunch of RDP sessions for Covid-19 home users. I see and understand memory usage but I'm not sure what this process is that is now using 95% CPU cycles?
/usr/bin/guac-standalone. /sbin/guacd -f is taking up the reset for the most part.. I think that AV scanning?

TIA

BarryG · March 2020

To provide some closure for future readers..

I opened a ticket and this process does control the Access Portal function. Currently had 8 connected users which maxes out the CPU on the Firewall. Ticket is being escalated to WG Engineers to inspect as my 2nd tier guys says this is all new territory for WG..

I'll post once I hear back from that group on what is considered the new norm.. Otherwise its back to creating SSL with L2TP/IPSec VPNs, opening RDP sessions which don't show the same overhead. Just a lot more clicking and training staff..

BShaner · March 2020

Please updated with your findings. I have a similar number of users and have noticed guac-standalone using nearly 100% of my M3700's CPU.

BarryG · March 2020

Watchguard engineers have reported and logged this as an official BUG. They are working on a fix.

Bruce_Briggs · March 2020

For any site which has this issue, they should open a support incident on it.
Then when a fix is available, they will be notified of the fix and will get a link to the fix for installation.

ChrisR · March 2020

Any update on this? New firmware fix coming? I have a similar issue with the same scenario. A lot more Access Portal users. M4600 here. Ranging from ~19-40 remote users. Plus over ~100 SSLVPN and 10 IKEv2

BarryG · March 2020

Nothing yet. But as Bruce says - log your own ticket so you get notified. My M470's have been pegged for about a week now, but my remote users (increasing daily) have not complained about any issues so far. And the FB haven't crashed off the face of the earth so far..

BarryG · April 2020

Just to update.. my ticket has been Escalated again after the Master issued a Fault Report on UserSpace this morning. Portal and users are up and can still connect but the CPU graph no longer shows the Masters CPU activity.

BarryG · April 2020

Just and update and closure to this thread. Our Cluster Master issued a Kernel Exception after a few days of initiating the Access Portal. It rolled over to the Passive device and is now reporting 'normal' CPU usage after since then (several days). In our case about 8 to 15% utilization. WG techs decided 'it ok now' and closed the ticket.

So I guess if this happens to you, find a time to hard boot the cluster and maybe it will come up smelling roses.

Bruce_Briggs · April 2020

Not the best response from support IMHO

james.carson · April 2020

Hi @BarryG
If you can reply with the case number, I can have the case checked by a team lead and re-opened. We certainly don't want to be closing cases that you're still seeing an issue with.

RMT · April 2020

Hi,
Just to say that we are seeing the same issue with an Active/Passive cluster of 2 M570’s.
We have up to 80 RDP users through the Access Portal on at any one time and usually the CPU load is between 55 – 70% but occasionally we see the processor load hit 99% (Red 3) and crucially this does not drop away as RDP sessions are ended. The only way to fix the issue is to failover the cluster and then reboot the offending firebox (which then clears down the CPU). I assume that this is the bug we are talking about?
The offending process is /usr/bin/guac-standalone 95.81% CPU
We have logged a call with WatchGuard, but their work arounds are to either configure a Mobile VPN for the users that need the RDP connection or configure an Authenticated SNAT Policy to reach RDP servers which would not really be workable for us.

james.carson · April 2020

Hi @RMT

80 users is quite a few if they're all using RDP. The firewalls will render RDP sessions on the firewall itself in order to send traffic out to the clients via access portal.
If you can reply with your case number, I can ask the support team leads to escalate your case.

Thank you,

BarryG · April 2020

@James_Carson Since our Kernel dump by the Firebox after several days of high processor load, we have not seen a problem. CPU utilization is sitting around 18% with about 15 Access Portal users, 8 L2TP and 3 IKEv2 users..

What I am now seeing through is duplicate authentications being logged for Portal Access users.. my work around is to expire the longest running duplicate logons.

If I had a request of WG and things that would have made for less support tickets, it would be to have more examples of the different support access systems. Even video snippets to help end user/Administrators decide which of these access variations would be best for their situation and, how they can be configured. There is a allot of variation from WG Support I've spoken to so far.. It's great that the boxes can support all these variations at the same time.

You Support guys are working hard and it is MUCH appreciated.

james.carson · April 2020

Hi @BarryG
Are you looking for videos on setting up items like the access portal, reverse proxy, etc? I can certainly pass that on to the video team who produces those.

BarryG · April 2020

@James_Carson
Yes and I think there are some of those already. Its more the IKEv2, L2TP, Setting up and configuring Authentication Portal and other SSLVPN options. And maybe Best Security practices to securing those access points - Follow-on is creating and applying groups and permissions.

If you have time to configure and test, its not a problem, but with users flying out the doors now at unprecedented rate, there's allot more pressure to get these solution up and running quickly.

And not all of us in small shops can spend allot of time only working on the and testing Fireboxes.

RMT · April 2020

@James_Carson said:
Hi @RMT

80 users is quite a few if they're all using RDP. The firewalls will render RDP sessions on the firewall itself in order to send traffic out to the clients via access portal.
If you can reply with your case number, I can ask the support team leads to escalate your case.

Thank you,

Hi James,
Thank you for your reply.
Our case reference number is 01357900
Before the Covid-19 outbreak we were using a clustered pair of M400's and these were failing over constantly when we started adding more and more user RDP access through the Access Portal to cope with the extra home working demand. This was unusable hence why we upgraded to the M570's.
These are holding out well and touch wood haven't failed over once yet, but we do get this issue with the 100% CPU as advised earlier.
What we are looking for is a definitive answer from WatchGuard engineering as to whether this is a bug or not and if so are they are actively working on a fix?
Another possible option would be to add an additional FireboxV to handle the RDP connections rather tan the M570's? On your website it does say that there is a 120 free trial for this? Is this full functionality and how would we go about requesting this if needed?
Thanks for your help, much appreciated.
Robert

james.carson · April 2020

Hi @RMT I was able to find your case and asked that the leads have it escalated a few days ago -- so that's already happened. If the case gets stuck, please feel free to reach out to me and I can ask that they do so again.

They are looking into the CPU issues -- but they tend to be different for every installation, as every configuration is a bit different.

For the FireboxV, if you'd like to pursue that route, you'll need to talk to your WatchGuard Sales Rep. If you have their direct contact info, I'd suggest touching bases with them and they can get that issued for you.

If you're unsure whom that is, you can fill out the form here:
https://www.watchguard.com/wgrd-products/evaluation

and the sales rep assigned to your area will contact you with more information.

Thank you,

BarryG · April 2020

And I'm still seeing now infrequent maxing of CPU cycles by the process as well as Kernel and UserSpace crash logs.

@James_Carson I'll reopen my ticket on the subject so I can get updates.

RMT · April 2020

Received the following update from WatchGuard Support:

My name is ******** and I have been tasked with following up on your case. My office hours are 4:00AM - 12:30PM PST, Monday - Friday. If you require assistance outside of my office hours, please call +1(877) 232-3531 (in the US or Canada) or +1(206) 613-0456 (international) to speak to another representative.

I was able to review the information presented in this case however the problem is linked to a known issue:
FBX-12318:guac-standalone high cpu when RDP abnormally exits

There is currently no resolution to the problem is question and the projected fix is aimed for firmware version 12.6.1 Update1 , there may be another another csp version coming up for 12.5.3

The workaround at this moment is to configure Mobile VPN users to access the network for those users in need of a RDP connection, or the other solution would be to create an Authenticated inbound SNAT rule to access the network.

RMT · April 2020

So at least we now know that this is an official bug and WatchGuard engineering are working on a fix which may come out as a CSP release for 12.5.3 (here's hoping)

@James_Carson thank you for all your help with this

BarryG · May 2020

We have been testing out a CSP release patch for this issue and can confirm WG have fixed the issue of high RAM usage, High CPU pinning, and kernel crashing causing failovers. As well, Access Portal users connections are now being released as expected.

I've been told WG will be rolling out an official 12.5.3 update1 which includes this bug fix shortly. Great work WG.

james.carson · May 2020

Hi @RMT
I'm not sure if you have any other cases open, but I had support reply to the one you mentioned with that patch information. Please take a look at it there if you get a moment.

Thank you.

Process guac-standalone

Comments

Categories