How We Did IT

How Home Healthcare Provider Finds Root Cause and Solves “Citrix is Slow” Complaints

“By leveraging Goliath, not only were we able to find multiple root causes to slow Citrix logon performance and implement permanent resolutions; but we also were able to deploy proactive alerting giving us an early warning system of poor session performance so we can address issues before users are even impacted.” – CIO of Home Healthcare Provider

Infrastructure

Citrix XenDesktop; Citrix XenApp; Citrix XenServer; ERP/EHR app; VMware vSphere

Challenge

A large home healthcare provider with 30 locations across the U.S. uses Goliath Technologies to anticipate, troubleshoot, and prevent end user experience issues across their hybrid IT infrastructure. Their environment is comprised of Citrix software such as XenDesktop and XenServer, VMware vSphere, and their combined ERP/EHR app.

Within their call center, their employees are scheduling appointments for patients and assigning home aids to those appointments all day long. For any patient, this is the first step in their care journey when they call in to schedule a visit. However, in many instances this initial interaction was not a positive one. All too often, patients were being told by the call center employee, “I apologize for the delay. My system is slow today.” This led to frustrated patients on the other end of the line, patients waiting on hold, and frustrated caregivers who couldn’t reach someone to talk through their schedule. Call Center employees were equally frustrated as slow logon times and poor session performance times with their EHR application and other systems were negatively impacting their ability to delight their patients and caregivers with a fast, efficient scheduling experience. IT was continually being told about these issues, but without a purpose-built tool to isolate the root cause of the issue they were struggling to implement a permanent fix solution.

Solution

The IT team was struggling to use free tools, like Citrix Director, to get the real visibility into the various stages of the logon process and find root cause on why logon times were slow. They also lacked historical and trend data to know just how big of an issue it was and how many users were being impacted.

Isolating 3 Root Causes for Slow Logon Times

The IT team in this user story was able to identify three different areas impacting Citrix logon times and successfully address all three.

High CPU on Hypervisor Hosts

The first area they discovered, by watching over several days, was that end users who consistently complained about slow logon times were using old Citrix XenServers. Anyone whose desktop was rendered on those old servers would consistently see a decline in performance as the resource utilization rates were high. The IT team could see this by going into the session display window (see Figure 1) and sorting by slow logon times. By clicking into a specific user session, this administrative team was able to quickly see high resource utilization rates on the hypervisor host tab and then contribute that to the slow logon times (see Figure 2). The management team used this data to justify spend to replace those old servers and saw immediate improvement in logon speeds.

Figure 1: Citrix Virtual Apps & Desktop Session Display – This page shows all active sessions; an administrator can filter by high logon times and then sort on columns such as Machine Name, Username, Group Name, and Start Time to start looking for patterns to identify root cause of an issue. Here you can see CPU Use is an issue.

Figure 2: Hypervisor Host Insights – The Session Summary tab displays performance indicators across items like ICA Round Trip Time, CPU usage on the VMs, and CPU use on the Hypervisor Host. Here you can see the CPU spiking on the Hypervisor Host, which is an indicator that there is an issue with the host server.

Issues with Group Policies

The second area causing performance issues was that group policies were not properly configured, which was negatively impacting one of the stages of the logon process. When the IT team started drilling into various end user sessions with poor performing logons, they saw an extremely high-profile load time (see Figure 3) across many of their users. Based on this information, they were able to reconfigure profiles and update the golden image to decrease the load times and improve overall logon performance.

Figure 3: Logon Duration Details – The Logon tab is comprised of four main sections displaying a wide variety of data to assist in diagnosing slow logon times. Here you can see profile load is a root cause for the slower logon times.

High Memory Utilization

The final area the team was able to identify is that some of the VMs had very high memory utilization rates, and this was causing screens to load slowly or crash altogether. In order to solve this issue, the IT team had to add more RAM to the VMs.

Preventing Downtime During Windows 10 Upgrade

Once this healthcare provider had identified root cause of slow session performance, they wanted to prevent these issues from reappearing so they added multiple monitoring rules that would alert them if different thresholds were being reached. For example, they could be alerted if CPU spiked on hypervisor hosts so that they could proactively address versus waiting for end user complaints to pile up.

Figure 4: Monitoring Rules Setup – Monitoring rules are easily set up with configurable thresholds to align to each individual environment.

This ended up being critical for them during their Windows 10 upgrade. After testing in their environment with a small load of users, they felt ready to push live to all users. They pushed live and immediately Goliath monitoring rules started to alert them that the write cache was filling up. They pulled back the upgrade, adjusted the cache, and pushed live a few days later. What could have been a failed upgrade by end users turned into a successful upgrade with no degradation in performance.

Results

They solved the immediate issues, restored confidence in IT, and changed their IT organization from reactive to proactive. By leveraging Goliath Technologies, the IT team at this healthcare provider was able to solve mission-critical issues that were impacting quality of patient care. The team was able to get Goliath up and running in hours, due to its out-of-the box functionality requiring no scripting or customization to set up alerts and monitoring. Beyond improving overall troubleshooting when issues arose, this organization was able to prove to their CFO how legacy infrastructure components (old XenServers) were negatively impacting end users and, as a result, built the business case to upgrade for new servers and solved logon slowness for many users.

The proactive alerts ensure that IT continually delivers high performing virtualized desktops and are quickly alerted should any issues arise, ensuring they are resolved before end users are impacted.