Troubleshooting: Logon Duration – Root Cause is UPM Profiles
End-users are complaing of slow logon issues and the issues occur at random to different users at different times of the day.
Identifying Root Cause
When trying to identify the root cause of a Citrix end user experience issue, one of the key pieces of information is if the issue is widespread or if the issue is somewhat isolated. The quickest way to make that determination is by using the Topology View within Goliath Performance Monitor.
The Topology View provides access to a high-level overview of the Citrix environment. This view is automatically built and scales to any size environment and even multiple environments.
To access the Topology View, click Dashboard then Topology. This will display a window similar to the following:
The first metric you’ll want to review is ICA Latency. ICA Latency is the time from when a user executes a keystroke or mouse click to when it is processed on the session host. It includes the network latency and any delay on the session host to process this request. Normally, users will notice slowness when there is a sustained ICA Latency over 250ms, and the experience will be significantly impacted when over 400ms for a sustained period of time (2-5mins).
When there is high ICA Latency, it should be compared first to the network latency to determine what is causing the delay. If Network latency is responsible for the spike, then networking is most likely the cause and you’ll want to review the Connection Performance metrics. If those metrics look OK then server performance and resource availability is the cause of the delay. When this is the case, you’ll first want to look at the ICA RTT then move over to the Server Performance tab of the drill down. This workflow is reflected in the data samples below. You’ll want to look for a direct correlation between RTT and processor utilization or RAM on each analyzed session.
The Topology view is divided into 3 sections: Infrastructure, Delivery and Machine. To identify if logon duration issues are session host or VDI-related, drill down to the delivery layer of the view. From the delivery layer of the topology view, at the bottom of the page, you will see the past 2 hours of average logon duration on a per delivery group basis. By looking at the trend graph (screenshot below) you can instantly identify if one or more delivery groups are experiencing logon duration bottlenecks. This information allows you to confirm if the issue is less scattered than originally reported/assumed.
After reviewing the Topology View, next you’ll want to go the XenApp & XenDesktop session display.
To access the XenApp & XenDesktop session display, click View then XenApp & XenDesktop. This page is divided into 3 areas as well, App Servers and Published App & Desktops for Citrix XenApp environments and Virtual Desktops for XenDesktop environments. You’ll want to navigate to the applicable section for your environment (Published App & Desktop or Virtual Desktops) to troubleshoot further.
These displays include user session data (both past and present), and allow you to track the complete user experience through the environment, from the login at the endpoint, all the way through the environment back to the underlying infrastructure, and present these data points over the course of the session so you can troubleshoot any issue that takes place during a user’s session.
In the case of logon duration and the issue where long logons are occurring for different users at different times of the day, you’ll want to use the search button at the bottom of the page to filter the page to match the time period at hand. Also, depending on the results of the analysis of the Topology view, it may also make sense to filter based on the delivery group name.
Once the page is filtered, you will see the sessions with high logons by sorting the page by the Logon column. Once the page is sorted, you can then use the other columns on the page like Machine Name, Username, Group Name and Start Time to confirm whether or not any patterns can be identified.
After reviewing the main display, click the individual line items to drill into the sessions and go to the Logon Duration tab. As mentioned previously, when reviewing high logon duration issues, it is important to look for patterns. One area that can often stand out are the Citrix Delivery Controller brokering of LESD and PLSD. It can be common for every instance of logon script execution and profile load times to be identical in duration, and is generally where most of the time was consumed, as seen in the examples below.
In this case, the net result of the slow logon issue points directly to the profile load time. In every session that is slow to logon, the profile load and script execution times are identical and the largest contributor to the overall logon duration. It is typical and expected to see both LESD and PLSD times match when the root cause of logon duration slowness is the result of Citrix UPM profiles loading slowly.
Root Cause & Recommendations
In the case demonstrated above, long profile load and logon script execution stand out as the primary consistency with long logons occurring in the environment. When these metrics match, it is indicative of Citrix UPM and is not in and of itself an abnormality. The steps listed below are recommended for tuning the environment to improve logon performance times.
- Review all user profile configurations and ensure that roaming data is minimized. Inherently roaming data requires time to upload and download, resulting in longer logon/logoff times.
- Review CPM configuration and ensure applicable optimizations such as caching, and profile streaming are enabled.
- Review performance of File Servers during the times of long logons, performance issues can cause delays with downloading and uploading profile data, resulting in logon times.
- Review the performance of domain controllers and delivery controllers during the times of long logons. As with file servers, delays in processing profile and policy data can also result in long logons.
- Analyze and identify if there are resource constraints on the VDI for sessions with long logons. Many of the problematic logons observed during the assessment had either high memory consumption, high CPU consumption or both on the virtual desktop machines the users were logging on to. High resource constraints at the VDI level also would contribute to long login times.
Based on the configuration complexity of user profiles and policies combined with the possible resource constraints it is likely that the slow profile processing in the environment is manifesting from a combination of several different variables. Following the recommendations above should help to pinpoint areas that tuning can occur and should produce positive improvements with logon duration.