Reflections on Extensive Nutanix Foundations

Introduction

I’m Hicham Benouari, a seasoned Professional Services Engineer and Nutanix Consultant at Metis IT, with 8 years of specialized experience in the industry. Throughout my career, I have successfully executed more than 50 Nutanix deployments, ranging from small-scale implementations to complex, large-scale environments, utilizing a variety of hypervisors. My extensive hands-on experience has given me deep insights into the challenges and nuances of Nutanix deployments. In this blog post, I’m excited to share the invaluable lessons I’ve learned along the way to guide you through your future deployments with confidence. Let’s begin by laying the groundwork for a successful implementation.

Nutanix Foundation is the tool used by Nutanix engineers to create a cluster and inject information about the cluster, host, and controller VMs. In the past, this process was done through the CLI and node by node. When all the nodes were installed with the correct versions and had the correct IP addresses, the cluster create command was given to create the cluster. Nutanix has since developed Foundation with a user-friendly graphical interface that consolidates all the necessary commands for cluster creation. Foundation is a fantastic tool that has evolved from being a buggy and unreliable installation tool that could only be run through a Linux VM to a reliable tool with minimal issues. It now has Windows and macOS apps available, as well as the option to use a virtual machine. However, there are still potential issues that can arise, which may not be Foundation’s fault but can be time-consuming to troubleshoot.

Preparation

Preparation is key before using the Foundation app. A document outlining the necessary requirements should be created. Nutanix has a handy document called the Questionnaire. In this document you can fill all the necessary information about your new cluster. These include:

  • IP addresses of CVM and Host, These IP addresses need to be in the same subnet.
  • The VLAN tag is required if the CVM and host are assigned to a physical NIC connected to a trunk port on the switch, and they are not using the native VLAN
  • If LACP is being used
  • IP addresses of the IPMI interfaces
  • Host names
  • Cluster Name
  • AOS and AHV (or another hypervisor) version, this part can also be tricky as you need to check the Compatibility and Interoperability Matrix to be sure that the version you are installing is supported. For example, is the AHV version is not compatible with the chosen AOS version the foundation will run but later on your cluster will have allot of critical issues and a refoundation is needed.
  • Determine your CVM Ram, this depends on what kind of workload the cluster will be running. Check this very handy link to see how much ram is needed.
  • Time zone of the cluster
  • Which replication factor will be used
  • DNS IPs
  • NTP IPs
  • Are you going to use the customers network or will it be a standard switch which is connected directly to the nodes. I will  explain how to make use of your own standard switch in the next paragraph.

Network dependencies

The Foundation tool is very dependent on the network, as every small issue will cause the progress to halt. Having a standard switch on hand can be a valuable tool when troubleshooting network issues during the foundation process. Many problems that arise during deployment can be linked back to the network but identifying the root cause can be challenging.

One way to test the network is to disconnect the nodes from the customer’s network and connect them to a standard switch along with your laptop. By doing the foundation in this setup, you can determine if the network configuration is causing the problem. If the foundation fails, review the configuration and versions of AHV and AOS. If it succeeds, then the issue may lie within the customer’s network, requiring collaboration with a network engineer to troubleshoot. Keep in mind that you need to skip the automatic cluster creation process of the cluster page if you use a standard switch. If the cluster is created all the CVM and hosts must be shut down gracefully to switch from the standard switch to the customers’ network. I always skip cluster creation till the nodes are connected to the customer network and are working (pingable).

When using a standard switch, ensure that the nodes have RJ45 ports not only for IPMI but also for data. Nutanix will pool all ports to VS0 initially (except the IPMI), but you can adjust the port usage after creating the cluster. Consider bringing SFP’s, especially if some nodes do not have RJ45 ports for data, requiring connections with SFP’s and copper cables.

There are scenarios where using a standard switch is not feasible, such as deploying with LACP enabled , if the customer has VLAN tagging or the customer is using CVM network segmentation (this can be skipped during foundation and enabled later on). In this situation, connect the necessary ports on the nodes to the customer’s network switch with at least one port. Ask the customer to configure the network properly and troubleshoot together if it does not work.

In summary, having a standard switch can be a helpful tool during deployment for troubleshooting network issues. It’s essential to understand when and how to use it effectively to ensure a smooth foundation process.Here is a very informative link that can help you understand the network dependencies.

If everything is collected, we are ready to start the foundation process.

Deployment

Starting the deployment

When launching the Foundation tool, the first page presents several important fields that require careful attention and completion. In this section, we will cover these fields in detail.

First off all the foundation tool can save the configuration so it can be used later. It can be very time consuming when you need to manually fill in every field every time you use foundation. The first option during the foundation lets you import a prefilled in configuration file that can save you time. You can use this tool to pre-fill the configuration and safe it.

Another crucial consideration is whether you will be utilizing RDMA passthrough for the CVMs. This technology, supported by certain switches, allows nodes to communicate directly with each other’s main memory without involving the processor, cache, or operating system. Enabling RDMA passthrough during the foundation process is essential, as setting it up later can be complex and time-consuming. Additionally, it is important to ensure that the NIC of each node also supports RDMA passthrough.

Moving on to the next step in the preparation, it is important to check if LACP is configured. If so, ensure that the switch is set to LACP with a fallback option. If this feature is not available on the switch, enable LACP at a later stage to avoid potential errors during the foundation process.

Step 6 involves verifying if the switch ports for CVM and host traffic are tagged. If they are, the VLAN must be specified in this step to prevent any issues during the foundation.

Steps 7 and 8 are straightforward, requiring the input of the subnet and IP address of the gateway for both the CVM and IPMI networks.

Step 9 is also crucial, especially if the network being deployed on is a routed network. In this case, it is necessary to check the “Skip this validation” checkbox to proceed with the deployment successfully.

If it’s not a routed network, this step can help with adding virtual Nics to your workstation ( that is connected the the IPMI network. As there will be a need for a temporary IP in the CVM network and a temporary IP in the IPMI network. If you are connected to the customer network make sure these IP addresses are not in use.

When you press next foundation will check if your workstation has the correct connections filled in (this is only checking the filled in data and not the real connection).

Node information

This page is where you are going to configure the nodes with their IP addresses. If the nodes already have the discover OS or a working and running CVM you can try to use the discover nodes option to see if your nodes are discoverable.

If that is not the case, chose add the IPMI nodes manually. This has 2 Flavors, if the IPMI IP’s are configured chose the “I have configured their IPMIs to my desired IP addresses” other ways chose the “I will provide their IPMIs’ MAC addresses. The IPMIs and this Foundation are in the same L2 domain”. The Mac addresses can be found on the sticker underneath the IPMI port of the node. For this option ensure that the network allows IPv6 link-local unicast.

Chose how many nodes you will add and press on add. Now you can start filling in your prepared information.
A couple of very handy tools are under the tools button:

  • Range autofill is a handy tool that can help with filling your data faster. Use this tool only if the IP addresses are in sequence other ways you will have wrong IP addresses in the fields.
  • Reorder blocks helps you to reorder the blocks in the desired order.

Cluster Information

This page is where the cluster will be preconfigured. Any preparations we have made can be easily incorporated here. If CVM network segmentation is required, CVM Network segmentation means separating CVM-to-CVM storage traffic from all other kinds of traffic. This is usually done for security purposes and is achieved by creating an extra interface called eth2 inside every CVM. It can be enabled on this page as well as at a later stage when the cluster is complete.

By using this method, the nodes will be properly set up and assigned the correct IP addresses. Once the nodes are set up, shut them down and switch them to the customer’s network. You can then use the cluster create command to establish the cluster.

AOS and Hypervisor

Also an easy step as we prepared the correct AOS version and Hypervisor. Starting from version 6.8 AHV is not included with the AOS package and needs to be downloaded separately. If the AHV is not given during this step the foundation will start but will fail very fast.  Keep in mind to check the Compatibility and Interoperability matrix to see if the chosen versions are compatible.

Also fill in the prepared CVM vRAM.

Security

This step has a couple of optional fields. If you want you can already fill in the cluster password. If you don’t you will be prompted to do it the first time you log in with the default password.

Cluster lockdown is disabled by default, because we will be using SSH during the deployment troubleshooting I keep this off. If needed it can be enabled through Prism Elemtens.

IPMI

In this step you need to fill in the username and passwords for the IPMI, a handy tip is to click on tools and select the vendor. Foundation will than fill in the username and password for you if they are default. For Nutanix nodes you can find the password on a sticker behind the node.