Share this post

March 18, 2014

Best Practice for configuring VMware VMkernel ports for NFS traffic

Paul Martin - Atlantis

Avatar

The importance of VMkernel port groups when presenting NFS data stores is key in getting great performance for NFS traffic. In this article, I’ll be talking specifically about VMware ESX as that’s the hypervisor we see most of you using in production today and why we recommend reducing the hops for vmkernel port  when deploying Atlantis ILIO.

 

What is a VMkernel Port Group?

The VMkernel networking layer provides connectivity to hosts and handles the standard infrastructure traffic of vSphere vMotion, IP storage, Fault Tolerance, and Virtual SAN. You can set up VMkernel adapters for the standard infrastructure traffic on vSphere standard switches and on vSphere distributed switches.

In the context of this blog, the VMkernel is used for IP storage (NFS or iSCSI). It is usually 10GbE and it is typically isolated from all other network traffic for security and performance reasons.

During a proof of concept for improving virtual desktop storage performance with Atlantis ILIO, sometimes it is only possible to release some of the resource that you were hoping for to run your testing with. Most of the time, that is something that can be worked around. One critical element of our best practice is the importance of the VMkernel port for NFS traffic.  Sometimes during a PoC, usually one where someone just wants to quickly kick the tyres, or test their (‘slow’) golden image on the new super fast ILIO datastore, it may well be mounted across a VM port group (VM network VLAN).

What is so bad about that?

Well, these VM port groups are designed for VM networking traffic only. The networking is typically 1 Gb, which is not great for storage traffic and the movement of large VMDK files. Typically there are many more hops from the vmkernel port to the NFS storage. Now the ILIO NFS connection is competing with other VMs for scarcer networking resources. Latency will spike: performance will be better than existing storage, but not be as good as it can be.

Ensuring optimal NFS performance on ESX

Let’s take into consideration what the ILIO appliance is actually doing here. It is serving a datastore from RAM back to the host that it resides on. The last thing we want is for said NFS storage traffic to be routed across our network on some randomly chosen VLAN. It will operate as fast as the slowest part, which I can assure you will absolutely not be the ILIO!

For non-persistent In-Memory deployments, ILIO presented NFS storage differs in one key respect to other types of NFS storage mounts in VMware in that it is usually not required to be shared with other ESX hosts. The reason being that high availability in non-persistent VDI architectures is provided by the broker. When running persistent, disk-backed ILIO, it is required to cross mount the ILIO presented NFS datastores in order for VMware High Availability (HA) to work effectively so the user can reconnect to their dedicated desktop. What is the significance of this? Simply put, I want to minimize the number of hops from host to datastore. Why? Because it reduces the latency, quite dramatically.

Incidentally, there are also other NFS tweaks that may be required outlined in this article by Cormac Hogan from VMware. Bear in mind that these settings are required for disk-backed deployments when using NFS on the back end. Also do not forget to change the default adapters to VMXNET 3 so you can take advantage of 10GbE, this is the default if using ILIO Deployment Services.

Minimising the number of hops is conducive to good performance. Switching is less latent than routing. The more hops that it takes to get from NFS mount to ESX host, the more latency is introduced. Granted, this may not be possible in persistent, disk-backed ILIO environments where storage needs to be shared for resilience purposes. But for In-Memory diskless deployments it will provide a boost to the overall performance.

Here is a really useful link from Chris Wahl on The necessity for nfs vmkernel ports explained that has in-depth information and some cool graphical representations of what your NFS traffic should look like. The graphics are way better than anything I can cobble together. See below a diagram of how it should be done (NFS traffic AND graphics);

The VMkernel port configuration is critical to the performance of NFS networking: not having this configured properly will lead to a serious drop in performance. We have observed a increase in performance of almost 80% and a massive 10x decrease in the latency by enabling a dedicated VMkernel port. See below the results of this simple change, before and after!

Before (No dedicated storage networking) 4.7K IOPS.                            ….and after, 33K IOPS and sub 2 ms latency!

 

Best Practice for Atlantis ILIO on ESX: Use VMkernel Ports for NFS Traffic

The case for correct configuration of a VMkernel port group for NFS traffic on the ILIO is clear cut. Reducing hops reduces latency. For in-memory diskless deployments, there is no need to share the NFS datastore, so traffic is internal to the host and we will not be competing for NFS network traffic resources. For persistent workloads, ensure that there is a minimum number of hops – ideally just the one – on your NFS network. The upshot is a massive latency reduction and with Atlantis ILIO, all you can eat IOPS on a per host basis.

Related articles;

VMware NFS References

VMware’s NFS Best Practices.

VMware – VMkernel and Storage.

VMware – Setting up VMkernel Networking.

12345
Current rating: 1.8 (5 ratings)