If you have a Citrix environment there will come a time when you have to troubleshoot an issue or two– an obvious understatement. As an IT Administrator and Consultant, I often found my monitoring solution would give me performance metrics but wouldn’t help me troubleshoot the root cause of a performance issue because they just didn’t get enough detail and couldn’t deliver it in a way that allowed me to avoid an issue or objectively prove root cause. Frustrated, I thought of all the missing data, blind spots and failure points and developed a series of 26 questions that would determine, before I purchased a solution, if the vendor could get the specific details and render them in such a way that I would be able to isolate root cause and resolve the issues I had experienced. In other words, IF the vendor WOULD be able to troubleshoot an issue when the time came.
Every monitoring vendor says that they can monitor everything—HDX, Logon Process, End User Experience—but the devil is in the details, and there are a lot of details. So, I tell the vendor to put aside the marketing feature presentation and just tell me if they can do the following 26 things. It is in a Yes/No answer form because I want problems avoided or fixed. No credit for “partial” SLA’s in IT.
Vendor Scorecard: Citrix Troubleshooting XenApp 6.5 – XenDesktop 7.x
Check the Box if This Capability Exists in the Product
Download the Questions Here
1) Can you schedule to automatically confirm Citrix and application availability in production from each and every location users are signing in from before they arrive or attempt to logon without manual management or intervention? Can you do this from all locations simultaneously from a central console?
2) Is there a way to identify an availability issue by site, geography or individual user via live dashboard?
3) Is the tool intelligent enough to identify client-side alerts such as licensing errors, gateway configuration problems, and other endpoint failures that will lead to false positives if not identified?
4) Can you trigger alerts based on thresholds, events, and failures, and then execute automated remediation actions to resolve those problems?
5) Does the solution allow for dynamic grouping and policy assignment to ensure that proactive rules are assigned accordingly upon migration (i.e. hybrid cloud)?
6) Can the solution allow me to know when a system level issue will have an upstream impact on user experience from a single dynamic screen?
7) Can the technology monitor and alert on file level changes, including file/folder growth, cut/copy/paste actions and specific file edits?
8) Does the technology have the ability to gather data and analytics from the entire stack so an administrator can determine an end user’s experience and trace that back to the underlying infrastructure element impacting it?
9) Can you see your entire Citrix environment in one graphical view that shows the dependencies and relationship between Citrix & supporting IT elements, so at a glance you can easily identify the location or IT element that is experiencing a performance issue?
10) Can you isolate as the root cause of a logon duration issue and identify root cause regardless of where it manifests from? Some example of this include, Citrix Receiver startup on the endpoint, waiting for session host readiness, policy execution of the specific policy, DNS resolution during the ICA/HDX connection creation, or any of the other 33+ stages?
11) Can you drill into session logon to know how long the domain controller takes to obtain GPOs and determine if they are applicable?
12) Can you troubleshoot the root cause of random consistent slowness and isolate it as an unreliable Wi-Fi connection or available connection speed to the endpoint for Citrix?
13) When troubleshooting can the technology correlate slow ICA Round Trip Time directly to storage performance on a specific datastore, or hypervisor host CPU Ready issues that the session host resides on?
14) Can you analyze and compare logon, connection, and session performance for both on premise and/or cloud workloads to identify and troubleshoot end user experience issues?
15) Does the solution support rapid deployment for environments where cloud bursting is leveraged for temporarily increasing available compute resources?
16) Can the technology provide visibility into ICA Round Trip Time, ICA Latency, Network Latency and session bandwidth available? Can it help to identify the root cause of session slowness as either the network connection, the underlying systems infrastructure performance, or application performance?
17) Does the technology provide me with visibility into user activity via ICA channels and application usage statistics? For example, does it allow me to correlate high Thinwire bandwidth utilization directly to the user watching YouTube from Chrome in session?
18) Can performance counter data from monitored resources be graphically presented, historically trended and analyzed for root cause analyses of performance related issues?
19) Can your solution generate a report of all user’s sessions from the last 6 months? Can it identify if the sessions were active vs idle?
20) Does the technology allow for the visibility to troubelshoot down to the endpoint? This includes performance monitoring, alerting and event log analysis.
21) Can you furnish me with at least two references that will speak to how they used your product to isolate root cause and troubleshoot issues or avoid them completely?
22) Can you generate reports to prove that an issue is an infrastructure, resource, or Citrix issue?
23) Can you generate root cause reports that show faults or errors causing application failures, print driver problems, user GPO faults, and configuration problems such that strategic, targeted remediation activities can be initiated?
24) Does the solution provide a comprehensive log (syslog and event log) correlation engine for parsing and analyzing log data to find root-cause?
25) Can you put alert resolution instructions or knowledge base in the product so that level 1 support, when an alert is triggered, can resolve issues usually solved by level 3?
26) Can you implement self-healing, auto remediation actions without customization? Such as restarting a service, killing a process, running a script or application?
Total Score /26 Vendor Name:
Using these questions and this overall “paper evaluation” process allowed me to make the best recommendations for my clients during my time as a Citrix consultant. The reason for this is that the key is to ensure that vendors are questioned deeply about the way the product will assist in troubleshooting and resolving Citrix end user experience and/or performance issues. This is especially true today as there are a multitude of products on the market with the ability to address XenApp and XenDesktop monitoring requirements.
I know the reference question is old school but I would really suggest you speak with customers who used a vendor’s technology to troubleshoot an issue. As a matter of fact the more references the better, and three is generally the magic number. When you talk to the reference, find out what their specific issue was and how they used the product to determine root cause. Also ask about the support they received with deploying and using the product. Quality of support is as important as functionality in the grand scheme of things. The technical people you work with should be experts in the product and the platforms that you are using the product to monitor.