A few weeks ago I was working with a customer who was using Goliath Performance Monitor to monitor their MEDITECH applications. This customer was a Health Care System that includes 2 hospitals and a number of other health care related facilities.
The Health Care System had over 150 VMware VMs supporting their MEDITECH EMR/EHR applications via Citrix XenApp for over 1,000 users. Their primary concern was preventing end user complaints resulting from application and server issues. By configuring our product we were able to let them know of impending issues so they had time to react and solve problems before end users were impacted. This blog will speak to how we did that.
Monitoring Rules & Real-Time Alerts
So first we set up our Monitoring Rules and real-time alerts. Keep in mind these are all out-of-the-box, meaning that the expertise of knowing what conditions to look for or monitor are already pre-loaded into the base product, as are the alerts. So the rules tell the product what conditions to look for on the hypervisor, virtual machines, OS, applications and hardware.
For example, we deployed MEDITECH, VMware, Citrix, and many other rules. The rules also contain the alerts that will tell you in real time if a trigger event has taken place, and can also perform proactive remediation actions as well.
In the case of the Health Care System, once the monitoring rules and real-time alerts were configured they were instantly alerting of faults taking place in their environment. After running the product for a month or so they started to experience a high CPU utilization alerts on a handful of their MEDITECH XenApp Servers. In receiving the alerts they were able to drill into the issue to determine the root cause of these alerts.
In the screenshot below, you will see that they would first navigate to the CPU page to get a glance at all of the average and percent CPU for the entire Citrix and MEDITECH environment.
Out-of-the-Box CPU monitoring
From this page the customer was then able to compare the alerted machine CPU against others in the environment. They were also able to drill into the VMware virtual machine metrics by clicking the graph icon to view additional key CPU metrics via the hypervisor trending over time.
In the screenshot below the customer was able to use this display to see that the %CPU had been steadily increasing over an hour time span before it maxed at 100% for one their XenApp servers which was hosting MEDITECH.
Identify Usage Trends
Finally the customer was able to note all of these key metrics before moving to the Citrix XenApp/XenDesktop Session Display to see how user activity could be affecting the XenApp server. Once on that display they were able to search for all of the sessions on that particular machine for a given time frame. Once spotting a couple users with high CPU they were able to drill into those sessions.
To see what the user was doing, they opened the ‘Application Performance’ tab. The Health Care System was able to then determine from the display that the said user was running 6 instances of csmagic.exe that were collectively monopolizing the machine resources. Once the root cause was identified the help desk was able to kill those processes and in turn stabilize the machine.
View Processes and Identify Users Burning Resources
These are only a few key screens and metrics, but they were hugely impactful to this customer. Since implementing Goliath Performance Monitor a few months ago they have been able to catch multiple CPU and memory leaks in their environment across multiple locations. They have also implemented additional alerting in regards to process utilization to get ahead of situations like this from happening again.
I hope you found this information useful and if you have any comments or questions please use the comment box and I will be quick to respond.