Sporadic DNS Issue with multiple users over SSL-VPN

We have a lingering DNS problem that has persisted through version upgrades. Occasionally we are getting DNS queries sent to the internet instead of the internal DNS servers while connected to SSL VPN using the watchguard client. Because our internet domain matches our internal domain these resolve and are stuck until we run ipconfig /flushdns We've been using this workaround for years it doesn't come up in any reliable pattern.

This is on machines which we've adjusted the automatic metric on and those we haven't. I'm assuming our DNS servers didn't reply quick enough or a packet gets dropped here or there and it eventually falls over to one of the other adapters settings.

Any ideas on how we can resolve/mitigate the issue or what the cause may be?

Comments

  • The cause is just a public DNS conflict. I never have issues with .local domains.

    There are 3 fixes:
    1st fix (Recommended if you have bandwidth for it): Turn off split tunneling and fully route through VPN. This will force your users to use the internal servers before routing out.

    2nd fix: ** is to manually update the **HOSTS file and make sure that you force DNS queries to your internal DNS servers. I believe you can push out a custom HOSTS file on GP.

    3rd fix (least recommended): Update Interface Metrics. The issue of course is that sometimes machines don't always keep the same interfaces and Windows updates like to break these.

    I say try the 1st fix and see if that kills performance. If it kills performance I advice switching to IKEv2 and going from there... only issue is that IKEv2 won't append the DNS suffix automagically like SSLVPN so your users would have to use //fqdn rather than hostname

  • The following explains how VPN name resolution works, and implies that DNSCache is the 1st place checked.

    VPN name resolution
    https://docs.microsoft.com/en-us/windows/security/identity-protection/vpn/vpn-name-resolution

    You could use the OpenVPN client and automatically run a script after connection which clears the DNS cache.

  • While the true FIX is to use a subdomain of the public domain as your internal domain to avoid this issue in the first place (and not and not resort to using .local domains that cause their own internal issues), a workaround may be to have your Firebox feed your active directory DNS server's IP as part of DHCP, then use a registry workaround on the remote device to force those devices to always query the primary DNS server first. Using the registry setting noted below will make devices query the first DNS in the list every time. As long as that DNS is reachable, they'll work fine, and if they get no response, they will through to the secondary DNS for THAT lookup. The next lookup will go back to the primary again rather stay glued to the secondary that responded and wait for that to fail as th enormal process would do.

    This link https://support.microsoft.com/en-us/help/320760 is now dead, but this is the content needed. It was made for XP and still works in Win 10.

    To work around this behavior, modify the registry so that the DNS server that is configured first is tried first on each query.

    Follow these steps, and then quit Registry Editor:

    Run regedit and then click the following key in the registry:

    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Dnscache\Parameters

    On the Edit menu, point to New, and then click REG_DWORD.

    Type ServerPriorityTimeLimit, and then press ENTER.

    On the Edit menu, click Modify.

    Type 0, and then click OK.

    When you set ServerPriorityTimeLimit to 0 (zero), the server priorities are reset before the DNS Client service decides which DNS server to use. You must restart Windows XP for these changes to take affect.

    Note: In the ServerPriorityTimeLimit registry setting, only values of 0 change the default behavior. All other values cause the default behavior.

    Gregg Hill

  • @greggmh123 said:
    (and not resort to using .local domains that cause their own internal issues)

    What issues have you ran into with .local domains? All ".local" domains I have ever used have worked fine with less issues than ".com" ones or even sub-domains.

  • edited August 2022

    I wanted to let folks who helped know our final solution, we tried the DNScache issue and it resolved a good bit of it but not all.

    Our end solution was to move the VPN users to send ALL traffic through the tunnel and then firewall rule block all DNS enquires to anything but our internal servers. I don't love it but it's solved our issue.

Sign In to comment.