Understanding Performance of NetScaler VPX

Understanding Performance of NetScaler VPX
In the Solutions Lab, we utilize NetScaler technology in all of our projects for a simple reason. It is often much easier to deploy VPX instances or leverage SDX systems running VPX instances rather than track down actual appliances. VPX instances allows us to be much more flexible in the lab environment. So, understanding how a VPX performs in ...

In the Solutions Lab, we utilize NetScaler technology in all of our projects for a simple reason.  It is often much easier to deploy VPX instances or leverage SDX systems running VPX instances rather than track down actual appliances. VPX instances allows us to be much more flexible in the lab environment.  So, understanding how a VPX performs in supporting XenDesktop ICA users has become important

How We Recently Tracked NetScaler’s Performance

As a start, we deployed a default VPX configuration with 2 vCPUs and 2 GB of memory running release 10.5 with virtualization optimized and 8 Gb license. To track the performance, HDX Insight was used to monitor the VPX. The focus in HDX Insight was:

  1. Packet CPU Utilization
  2. InUse Memory
  3. ICA RTT times.

Guidance from the NetScaler experts told us that the ICA RTT times should be below 250ms for optimal performance. Packet CPU Utilization should be around 80%. A single VPX was used for testing, so no load balancing.  Finally, SSL was used only for use ICA connections.

To Put the VPX Under Load

We started a 1000 user test. At 500 users the system was at Packet CPU Utilization of 60% and RTT was at 11 ms, well within desired ranges. However, at 1000 users utilization was at 100% and RTT times were at 1.3 seconds.

A second run was done taking more snapshots. Again at 500 users the utilization was at 60% and RTT times were 13ms. As we past 750 users, things got interesting. At 761 users utilization was close to 100%, but RTT time was at 95ms. At 779 users the RTT time jumped to 260ms.

So at 750 users, we were over 90%.  Packet CPU utilization and RTT times were still in the double digits, but it quickly rose to unacceptable as we moved past 750 users. From this, a default VPX with 2 vCPU and 2 GB of memory is best utilized at 500-750 users. We did in one test push the VPX to 1300 users, no users were dropped but the RTT time got over 2 seconds. Response was slow, but it was still connected. In all of our tests the InUse Memory never exceeded 60% even under the heaviest load. The default VPX uses a 2K cert, we did test one run with a 4K cert and that increased InUse memory by about 10%, but had no other affect on performance.

We Now Had an Understanding of the Default VPX.

But, What if We Increased it to 4vCPU and 4 GB of Memory?  Or, 6 vCPU and 6 GB of memory?

Using the 750-user watermark, we re-ran the tests at 4 and 6. As the table shows, each increase reduced the 750-user load by about 50% on the VPX. The only thing that did not change much from 4 to 6 was the RTT Times.

So How Far Can 6 vCPU Be Pushed?

We ran a 6 vCPU configuration to 1,724 users with an 80% Packet CPU Utilization and an RTT of 62 ms. A Couple of Things to Note.Snapshots of the HDX Insight were taken at different times to get the information. Different runs may produce slightly different responses, depending on when the snapshot is taken, as shown in the change from 761 to 779 users in an earlier test. Also, the entire environment was rebooted between runs. As with any test environment, your mileage may vary depending on lots of different influences, but a 2 vCPU 2 GB default VPX can support 700-750 ICA users and a 6 vCPU 6 GB VPX can scale to 1,700 ICA users, but this can be affected by things like the workload and load balancing within the VPX.These are lab-based numbers, and are higher than what you would want to run in the real world deployments. These numbers do not include the impact of load balancing, or login storms, all topics we hope to cover in the future. The users were logged in over a 50-minute interval, and the VPX was running stand-alone on dual socket 8-core 2.6Ghz processor server with 192 GB of memory.

http://support.citrix.com/article/CTX139485.

A Couple of Things to Note.

Snapshots of the HDX Insight were taken at different times to get the information. Different runs may produce slightly different responses, depending on when the snapshot is taken, as shown in the change from 761 to 779 users in an earlier test. Also, the entire environment was rebooted between runs. As with any test environment, your mileage may vary depending on lots of different influences, but a 2 vCPU 2 GB default VPX can support 700-750 ICA users and a 6 vCPU 6 GB VPX can scale to 1,700 ICA users, but this can be affected by things like the workload, load balancing within the VPX, and how SSL is configured.

These are lab-based numbers, and are higher than what you would want to run in the real world deployments. These numbers do not include the impact of  load balancing, or login storms, all topics we hope to cover in the future. The users were logged in over a 50-minute interval, and the VPX was running stand-alone on dual socket 8-core 2.6Ghz processor server with 192 GB of memory.

For A Better Understanding Of How VCPUs Are Utilized And Licensed Refer To:

http://support.citrix.com/article/CTX139485.

A special thanks to Michele and Eddy for doing the excellent work they did in running all the tests.

Source: blogs.citrix.com