High Availability for Cloud Infrastructure

This section provides details about the GigaVUE‑FM High Availability (HA) feature in Cloud environment and describes how to configure, upgrade, and troubleshoot the feature.

Refer to the following topics for details:

■

About GigaVUE‑FM High Availability

■

Supportability Information

■

Recommendations, Rules, Notes, and Limitations

■

Licensing Information

■

Hostname Setups

■

Configure GigaVUE-FM HA

■

GigaVUE-FM HA Landing Page

■

Remove Standby GigaVUE‑FM Instance

■

Disassemble GigaVUE‑FM High Availability Group

■

GigaVUE‑FM High Availability States

■

Fail over Mechanism

■

GigaVUE‑FM High Availability Scenarios

■

Troubleshoot GigaVUE‑FM High Availability Issues

About GigaVUE‑FM High Availability

The GigaVUE‑FM High Availability (HA) feature supports a highly available fabric management environment with minimal interruption. The GigaVUE‑FM HA architecture consists of a minimum of three GigaVUE‑FM instances that run together as a highly available group. The highly available group provides protection from failure of any one of the members in the group. The communication between the GigaVUE‑FM instances in the highly available group can be encrypted using Pre-shared Secret Key or Certificate based Authentication.

The following image shows the high-level architecture of the GigaVUE‑FM HA feature.

GigaVUE-FM allows you to add more than three GigaVUE-FM instances to the HA group (up to 7). The hardware requirements and resource allocation process for the additional instances are the same as that of the active and standby instances. The additional instances can be configured either as supporting instances or as active-eligible instances:

Supporting instances

Supporting instances are GigaVUE-FMs that run minimal services, thereby providing more resources to OpenSearch services. These instances:

Do not distribute application requests, as all application services are disabled by default.
Do not participate in distributed services
Provide enhanced scalability.
Are excluded from active instance election

Active-Eligible instances

Active-Eligible instances are GigaVUE-FMs that distribute application requests and run distributed services. These instances:

Enhance fault-tolerant and high available system.
Are eligible for active instance election.

Refer to the following diagram for the roles in the HA group.

Note: When you add a GigaVUE-FM instance as either a supporting instance or a standby instance, the current configuration of that GigaVUE-FM instance will be removed once it is added to the existing GigaVUE-FM HA cluster.

Gigamon cloud solution deployments in GigaVUE-FM HA are supported on both public and private platforms. These deployments can operate in one of the following modes:

■

GigaVUE-FM Orchestrated

■

Third-party Orchestrated (Generic / Integrated)

In either scenario, you need to first launch GigaVUE-FM and run.

GigaVUE-FM Orchestration

Using GigaVUE-FM orchestration, you can deploy GigaVUE Fabric Components through the GigaVUE-FM user interface. After launching GigaVUE-FM and applying the required license, you can deploy components such as the GigaVUE V Series Node, GigaVUE V Series Proxy, and UCT-V Controller.

Once the deployment is complete, GigaVUE-FM continuously monitors the health and status of the deployed components. If any error condition occurs, GigaVUE-FM automatically relaunches or replaces the affected component to ensure system reliability.

Third-party Orchestration

With this method, you deploy GigaVUE Fabric Components using external orchestration solutions. GigaVUE-FM does not manage the deployment process directly, but you can still use it to monitor the health of the deployed components.

To enable monitoring, you need to provide an additional configuration file during deployment. This file allows components like the GigaVUE V Series Node, GigaVUE V Series Proxy, UCT-V Controller, UCT-V, UCT-C, and GCB to register with the Monitoring Domain in GigaVUE-FM . Refer to the folllowing sections for more details:

■

Deploy GigaVUE Cloud Suite for Third Party Orchestration

■

Deploy UCT-C Solution in Kubernetes

■

Deploy GCB Controller Service and Pods

Supportability Information

The GigaVUE‑FM HA feature is supported on the following platforms:

Orchestration Method

Platforms/components Supported

GigaVUE-FM Orchestration

●

AWS

●

Azure

●

OpenStack

●

VMware ESXi

●

VMware NSX-T

●

Nutanix

Third-party Orchestration

Generic Mode

●

AWS

●

Azure

●

OpenStack

●

VMware ESXi

●

VMware NSX-T

●

Nutanix

●

GCP

●

UCT-C

●

GCB

Integrated Mode

●

AWS

●

Azure

●

OpenStack

Recommendations, Rules, Notes, and Limitations

Keep in mind the following recommendations, rules and notes when you configure the GigaVUE‑FM HA feature:

Recommendations

■

Do not configure GigaVUE-FM High Availability (HA) across multiple regions or multi-cloud environments. This setup is not recommended and may lead to instability.

■

You can deploy HA across multiple availability zones within the same cloud region. Ensure that you deploy a GigaVUE-FM instance in each availability zone.

■

Use the orchestrated upgrade procedure to upgrade the GigaVUE-FM instances.

■

It is recommended to configure reverse DNS configuration while configuring FMHA with FQDN.

■

When creating a GigaVUE-FM HA group, it is highly recommended to create the High Availability group using DNS name. GigaVUE-FM HA cannot manage the individual GigaVUE-FM instances if the IP address changes (if GigaVUE-FM IP(s) are dynamic).

Rules and Notes

■

GigaVUE-FM cannot support more than 7 active-eligible nodes (this is due to the underlying constraint in MongoDB).

■

The GigaVUE‑FM instances in the HA group must be identical in terms of system configuration such as hard disk, memory, and network interfaces, which include domain server, NTP server, and name server. Domain server, NTP server, and name server should be configured in all GigaVUE-FM instances.

■

The GigaVUE-FMs in the high availability group can be accessed using the IPv4/IPv6 address (DNS name) that is used to form the High Availability group.

Note: Do not access all the GigaVUE‑FM instances at the same time, as this will impact the performance of the HA group.

■

You can deploy the GigaVUE‑FM HA virtual machines on a WAN link with a maximum latency of 200 ms.

■

You cannot add a GigaVUE‑FM Hardware Appliance and a GigaVUE‑FM virtual machine in the same HA group.

■ The Reload All or Reload any FM commands will work only if there is an active GigaVUE-FM instance in the HA group.

■

You must ensure that NTP is configured across all FM HA nodes.

■

The GigaVUE-FM HA Cluster functions like standalone GigaVUE-FM’s when a fail-over occurs during long-running operations, such as Fabric Launch, Monitoring Session Deployment, or Fabric Upgrade.

■

When FMHA is configured with a reverse proxy, access to GigaVUE-FM requires non-SAML authentication, such as local authentication. Use the following sample URL to access GigaVUE-FM:

https://<proxy_ip>/admin

■

If GigaVUE-FM has multiple IPv6 addresses, ensure FMHA uses the permanent IPv6 as the source IP. GigaVUE-FM excludes temporary IPv6 addresses from the IP set, which causes connection failures and FMHA formation issues.

Shrinking the size of the HA group:

When reducing the size of a HA group that has both active-eligible and supporting instances, it is always recommended to remove the supporting instances first so that the active-eligible instances remain intact in the HA group. If active-eligible instances are removed first from the HA group, then the HA group will become unstable. This, however, depends on the number of active-eligible and supporting instances in the HA group. Consider a GigaVUE-FM HA group with 7 instances. The number of active and supporting instances in a HA group may be as follows:

HA group 1	With 3 active instances and 4 supporting instances	The recommendation is specifically applicable for HA group 1 in which there are only 3 active eligible instances in the group. Removing the active-eligible instances may cause the HA group to become unstable.
HA group 2	With 5 active instances and 2 supporting instances.	The recommendation is not applicable for HA group 2 as removing the active-eligible instances will not impact the stability of the HA group.

For a HA group to be stable, it must have a minimum of two active-eligible instances in the group.

When reducing the size of HA group from 7 instance to 5 instance or 3 instance, the quorum is re-computed and adjusted accordingly. This is to ensure that the HA group remains intact with the available set of instances.
When reducing the size of the HA cluster, during removal of an instance make sure to give time for the OpenSearch to rebalance and return to a complete (100%) healthy state before removing the next instance. If you remove the instances subsequently one after the other, the health state of the HA group will turn red as there will be no time for shard rebalancing amongst the available instances.

Login to GigaVUE-FM CLI as a root user and execute the following command to check the ES cluster health:

curl -XGET "http://localhost:9200/_cluster/health?pretty"

Supporting GigaVUE-FM Instances:
When a supporting instance is added to the HA group, its system information will not be available immediately and will be available only after a minute.
When the HA group is upgraded, application services of the supporting instances will be turned on automatically for the upgrade and once GigaVUE‑FM HA upgrade is completed it will be turned off automatically. Application services will be turned off automatically irrespective of the upgrade completion status, that is, either success or failure.

Notes:
For supporting instances, system information will be populated into MongoDB as the Application Service is down by default (scheduler populates the system information every minute into MongoDB). Hence any changes in the system hostname or GigaVUE-FM version will be reflected in the GUI only after a minute.
In case of suspended mode, MongoDB will go into write protected mode. Therefore, any system related updates performed in the supporting nodes will not be reflected in the GUI.
If the supporting instance is not reachable, all system related information that is populated in the GUI reflects the previous known values and not the latest values for Application Status, Up Time, and other system related information.
Whenever the application services are up and running, events will be logged in the Events page (with the severity level - Info)

Licensing Information

You must install a Prime license on the active GigaVUE‑FM instance to configure a High Availability group.

If the Prime license expires or if you accidentally delete the license, the existing configurations will still be present in the GigaVUE-FMs that are part of the HA group, but you will not be able to perform any new configurations. Moreover, if you disassemble the HA group, you cannot reconfigure the HA group without installing a valid Prime license. You can re-use the same Prime license to form the HA group again.

Hostname Setups

The GigaVUE‑FM instances are not required to be in the same subnet, but still must be able to communicate with each other.

In addition, ensure that the instances have unique host names. You must be able to ping a GigaVUE‑FM instance from the other instances using the hostname or the IP address.

To add a GigaVUE‑FM instance to a GigaVUE‑FM HA group:

■

The GigaVUE‑FM instances in the HA group must be reachable to each other.

■

If host names are used to configure the HA group, the host names of the GigaVUE‑FM instances must be resolvable through a DNS server.

Configure GigaVUE-FM HA

To configure the GigaVUE‑FM HA feature, you must have access to at least three authenticated GigaVUE‑FM instances that reside on a trusted network. All GigaVUE‑FM instances must run the same software version. The interfaces in the GigaVUE‑FM instances must be up and must be assigned IPv4/IPv6 addresses. You can also choose to use DNS host names.

Note: You can configure the GigaVUE‑FM HA feature only if you have administrative privileges. You must have Prime license installed for forming the GigaVUE-FM HA group.

Refer to the following sections:

■

Create and Add GigaVUE-FM Instances to HA Group

■

Connect GigaVUE‑FM High Availability to a Monitoring Domain using GigaVUE-FM Orchestration

■

Connect GigaVUE‑FM High Availability to a Monitoring Domain using Third Party Orchestration

■

Integrate GigaVUE‑FM High Availability with UCT-C

■

Integrate GigaVUE‑FM High Availability with Gigamon Containerized Broker

Create and Add GigaVUE-FM Instances to HA Group

To add instances and to configure the HA group:

Go to

High Availability. The launch page displays the required prerequisites to create a HA group.

Click Create. In the Group Name field, enter a unique name for the HA group.

Note: Avoid using functionality names as the group alias when creating an FMHA group. Functionality names typically refer to actions or operations within the system and can cause confusion and conflicts. Choose a unique and descriptive name instead.

Choose any one of the following options to encrypt the communication between nodes:

Pre-shared Secret Key – Encrypts communication between GigaVUE‑FM nodes using an application-managed shared secret key. Refer to Encryption of Communication between FMHA Nodes.

Certificate Based – Encrypts communication between GigaVUE‑FM nodes using manually uploaded certificates. Refer to Encryption of Communication between FMHA Nodes.

In the Add GigaVUE-FM Instance section, the details of the GigaVUE-FM instance from which the HA group is created will be fetched automatically.

Edit the Primary Node

On the first instance, click the ellipses icon and select Edit Node.

Enter the following details: (These fields are optional)

Select the Use Tunnel IP check box to enable tunnel configuration. Selecting this check box reveals two additional fields:

•

Tunnel IP Address/DNS Name: IP address used for establishing the secure communication tunnel between HA instances. This is required when the instances are reachable only through public or external IP addresses.

•

External IP Address / DNS Name is same as Tunnel IP Address / DNS Name: If you check this box, GigaVUE-FM automatically copies the value from the Tunnel IP field to the External IP field and disables manual editing. If you uncheck it, the External IP field becomes editable again, retaining the previously entered value.

Enter the Entity ID and the IdP Metadata URL.

Click the save icon to save the configuration. Refer to Additional Fields for details about other fields displayed on the HA page.

Add New Node

Click Add New Node to add the GigaVUE-FM instances. Enter the following details.:

IP Address/DNS Name - IP address/DNS name of the GigaVUE-FM instances (for communication between the GigaVUE-FMs and to configure the HA group). It is always recommended to use the DNS name while creating the GigaVUE-FM High Availability group.

(Optional) Select the Use Tunnel IP check box to enable tunnel configuration. Selecting this check box reveals two additional fields:

•

When you add a new node with a Tunnel IP address to an existing HA group, the FMHA table updates to reflect the information in the newly added column.

Notes:

■ When you add a new node with a Tunnel IP address to an existing HA group, the FMHA table updates to reflect the information in the newly added column.

■ When using unique DNS for HA cluster configuration, ensure it always resolves to only one IP address—not multiple IPs.

Role - Role of the GigaVUE-FM instance. Can be configured as below. Refer to Active-Eligible instances for details.

•

Active Eligible- provides resiliency and load distribution.

•

Supporting Node- provides scalability and performance.

Username - GigaVUE-FM GUI Username.

Password - GigaVUE-FM GUI password.

(Optional) Entity ID - Unique alphanumeric ID assigned to the GigaVUE-FM instance

(Optional) IdP Metadata URL - URL pointing to the IdP’s XML configuration

Click Save to validate the details entered for the instances.

Use the Add New Node button to continue to add the GigaVUE-FM HA instances. You can click the Edit Node icon to change the external IP address. After adding the instances to the HA group, click Submit. The HA group is configured and will be displayed as follows. Refer to Additional Fields for details about other fields displayed on the HA page.

Notes:

■

When switching between the Pre-shared Secret Key and Certificate-Based options for encrypting communication between nodes, it's important to refresh the GigaVUE-FM to see the updated tunnel status. By clicking the link in the Tunnel Status, GigaVUE-FM displays the health status of the FMHA nodes.

■

When creating a HA group, all the existing data on the first configured GigaVUE-FM instance will be preserved, allowing the system to continue using the data after it is fully set up. However, the data on the subsequent nodes will be lost during the setup.

The HA page also displays the following details:

Table 1: Additional Fields
Field	Description
Status and Reachability	Status of the GigaVUE-FM instances. Can be: Active Standby Supporting Refer to Active-Eligible instances for details. Reachability: - indicates that the instances are reachable. - indicates that the instances are unreachable.
Host Name	Host name of the GigaVUE-FM instance.
Application Status	Status of application services that are responsible for handling the user requests: Up: Indicates service is ready to handle the user requests Down: Indicates service is not ready to handle the user requests with one or more underlying services being down. In Progress: A state that appears between up and down indicating that the services are initializing.
Software Version	Software version of the GigaVUE-FM instance.
System Uptime	Time since the instance is up and active.
Third Party Authentication URL	Third party authentication URL required for SSO configuration.
Upgrade Status	Upgrade status of GigaVUE-FM. This is displayed only when the upgrade is triggered. Otherwise, this field is empty.

You can also remove the instances from the HA group and reduce the size of the HA group as required. However, while removing the instances ensure to retain the quorum1 number of instances.

Note: To scale components such as Fabric Health Analytics, topology visualization and to improve the performance of GigaVUE-FM by fully utilizing the resource available, it is recommended to configure three active eligible instances and additional supporting instances.

The following are the behavioral changes observed in the HA group with active, standby and supporting instances:

Task	Scenario	Behavior
Upgrade Operation	Few instances not being up (either active-eligible or supporting instances not up)	Upgrade will not be initiated.
	Few supporting instances are removed from the HA group after formation of the group	Upgrade will be initiated, and the supporting instances present in HA group will be upgraded first followed by the active-eligible instances.
	All instances are up and running	Upgrade will be initiated in an active instance. Upgrade will be done in a rolling fashion starting with supporting instances and ending with the active-eligible instances.
System Configurations	All instances are active-eligible	All GigaVUE-FM's GUI will be accessible, and user can configure the system through the same
System Configurations	HA with active-eligible and few Supporting instances	For supporting instances, GUI will not be available (as application service is shut down). An option is provided in the GUI to turn on the application service and its dependent service(s) and the status can be seen in HA page. You can then configure the system related configuration from the logged in GigaVUE-FM instance once the application status of the supporting instance is up.
Background processes
-FM Service Distribution	All GigaVUE-FMs are active-eligible instances.	Services will be equally distributed/redistributed among the instances based on the availability of the instances.
-FM Service Distribution	HA group with active-eligible instances and supporting instances.	Services will be distributed/re-distributed only to active eligible instances
-Load Balancer	All GigaVUE-FMs are active-eligible instances.	All GigaVUE-FM API requests will be distributed to all instances in the HA group.
-Load Balancer	With supporting instances.	All GigaVUE-FM API requests will be distributed only to active-eligible instances.

Connect GigaVUE‑FM High Availability to a Monitoring Domain using GigaVUE-FM Orchestration

When fabric nodes need to communicate with GigaVUE-FM using public IP addresses, you should enable the Use Public IP option during Monitoring Domain creation. When Use Public IP option is enabled, GigaVUE-FM pushes its public IP address to Fabric Nodes. Fabric Nodes use this IP to communicate back to FM. FM does not initiate communication using its private IP. This setting is essential in cloud or hybrid environments where private IPs are not reachable, such as across cloud platforms. It ensures that GigaVUE-FM can send notifications and accept registrations using an externally accessible IP. Refer to Supportability Information for the list supported platforms.

Connect GigaVUE‑FM High Availability to a Monitoring Domain using Third Party Orchestration

Fabric Nodes can be configured with either the IP address (remoteIP) or the Fully Qualified Domain Name (FQDN) (remoteAddress) of GigaVUE-FM to establish communication. The node sends registration requests to GigaVUE-FM until it successfully registers. In High Availability mode, if a node fails to register with one GigaVUE-FM instance, it automatically attempts registration with the next available GigaVUE-FM in its list until successful. Refer to Supportability Information for the list supported platforms.

Use the following data to manually configure /etc/gigamon-cloud.conf with registration configurations to register the fabric components with GigaVUE-FM.

Copy

Registration:
    groupName: <Monitoring Domain Name>
    subGroupName: <Connection Name/Alias>
    token: <Token>
    remoteAddress: <remoteIPs/FQDN of the GigaVUE-FM>
    remotePort: 443
    sourceIP: <IP address of the fabric node>
    nameServer: <DNS server's IP address>
    usePublicIP: <true/false> 
    preferV6: <true/false><If set to true IPv6 will be preferred over IPv4, used only for DNS resolution>

Notes:

■

Configure either remoteIP or remoteAddress—not both. Using both prevents the node from registering with GigaVUE-FM.

■

Remote IP will continue to work as expected. However, if you plan to use FQDN, update the configuration to use the Remote Address field instead.

■

When configuring a remote address, you can specify multiple IP addresses. However, if you use a FQDN, ensure that you configure only one address

■

FQDN configuration requires a valid nameserver.

■

Ensure that the FQDN configured for Third Party Orchestration resolves to all IP addresses in the High Availability cluster.

Integrate GigaVUE‑FM High Availability with UCT-C

UCT-C supports integration with GigaVUE‑FM in high availability mode. You can configure the controller with either a Fully Qualified Domain Name (FQDN) or a comma-separated list of GigaVUE‑FM IP addresses in the Helm configuration. The controller iterates through the provided FQDN or IP list and attempts to register with the master GigaVUE‑FM node. If the GigaVUE‑FM IP changes, you must update the configuration again. This enhancement enables UCT-C to maintain connectivity and recover from GigaVUE‑FM failover events in HA deployments.

Below are the communication parameters:

■

fm_fqdn – Specifies the GigaVUE‑FM HA FQDN value used by the UCT-C Controller for communication with the GigaVUE‑FM HA group. When this parameter is set, the controller uses the IP addresses resolved from the FQDN.

■

fmha_ip_list – Specifies a comma-separated list of GigaVUE‑FM HA IP addresses used by the UCT-C Controller for communication with the GigaVUE‑FM HA group. If fm_fqdn is not set, the controller uses the IP addresses provided in fmha_ip_list by default.

Integrate GigaVUE‑FM High Availability with Gigamon Containerized Broker

GigaVUE-FM High Availability group can be configured in an Gigamon Containerized Broker through a DNS Server in which the GigaVUE-FM instances are launched and deployed in the Gigamon Containerized Broker environment.

For GCB to make use of high availability feature of GigaVUE‑FM, you must configure the FQDN (Fully Qualified Domain Name) of the GigaVUE‑FM.

In a standalone GigaVUE‑FM, the GCB uses the FQDN name of the GigaVUE‑FM (if configured). It uses the legacy method of using the configured IP address when:

The FQDN is not configured or
The GCB fails to resolve to an IP address.

For example:

GCB Controller YAML file:

env:

- name: GCB_CNTLR_EXT_IP_DNS

value: "10.xxx.xx.xx" (IP address of the DNS server – external to the Kubernetes cluster)

- name: FM_FQDN

value: "fm.myorg.com" (FMs FQDN that is used for DNS lookups)

Note: The FQDN identifies either a standalone GigaVUE‑FM or an FMHA cluster that includes multiple GigaVUE‑FM instances. It returns the IP addresses (IPv4, IPv6, or both) for all GigaVUE‑FMs associated with that FQDN.

For more details on the GigaVUE‑FM High Availability configuration, refer to the GigaVUE Administration Guide guide.

GigaVUE-FM HA Landing Page

When you login to any of the GigaVUE-FM instances of the HA group, the dashboard page appears. Use the drop-down option in the header of GigaVUE‑FM, which is available on specific pages, to view the details pertaining to that GigaVUE‑FM. For example, the following pages in the GigaVUE‑FM GUI have drop-down option in the header:

FM Health
IP Resolver
NTP Server for Clock Synchronization

Remove Standby GigaVUE‑FM Instance

To remove or replace a standby GigaVUE‑FM instance from the GigaVUE‑FM HA group follow the below listed steps:

Go to

High Availability.

Click the ellipsis on the standby GigaVUE‑FM instance widget in the High Availability page as shown in 2.

Remove Standby GigaVUE‑FM Instance

Select the Remove from group option and confirm your chnages. The selected standby GigaVUE‑FM instance is removed from the GigaVUE‑FM HA group.

Note: The status of the GigaVUE‑FM HA group changes from Healthy to At Risk. You will not be allowed to remove the other standby GigaVUE‑FM instance after the HA status changes to At Risk. To remove the standby node, if there is any network latency in GigaVUE-FM and the node is not removed correctly, use the command fmcs remove force in the GigaVUE-FM CLI of the standby node.

Disassemble GigaVUE‑FM High Availability Group

To completely disassemble the GigaVUE‑FM HA group:

Go to

High Availability.

Note: You cannot remove an active GigaVUE‑FM instance or disassemble the GigaVUE‑FM HA group by logging in from a standby GigaVUE‑FM instance.

Click the Delete HA group option on the High Availability page and confirm your changes. The GigaVUE‑FM HA group will be disassembled and each of the GigaVUE‑FM instances become standalone GigaVUE‑FM instances.

Note: Executing the above steps disassembles the GigaVUE‑FM HA group completely. The database of the individual GigaVUE-FM HA group members (active, standby, and supporting instances) will be erased and reinitiated with the default database. If you had performed orchestrated configuration flows (such as the Flexible Inline Arrangement or Intent Based Orchestration) you must manually remove and recreate these if they do not backup the GigaVUE‑FM database before choosing Delete HA Group.

If you cannot access the active GigaVUE-FM instance's GUI, use the following CLI command in all the GigaVUE-FM instances to disassemble the HA group:
/opt/fmcs/bin/fmcs leave force
Before disassembling the instances, take a backup of the configuration. After the GigaVUE‑FM instances are disassembled from the HA group, the configurations present in the instances will be completely removed.

GigaVUE‑FM High Availability States

The GigaVUE‑FM HA state depends on the status of the three GigaVUE‑FM instances. The following table lists the various states of the GigaVUE‑FM HA group. You can view the HA group state from Administration > High Availability in the GigaVUE‑FM GUI.

Table 3: High Availability States
State	Number of GigaVUE‑FM Instances	Description
Healthy	Three GigaVUE‑FM instances are up and running	One GigaVUE‑FM instance is in active state. Other two instances are in standby state.
At Risk	Two GigaVUE‑FM instances are up and running	One GigaVUE‑FM instance is in active state. Another GigaVUE‑FM instance is in standby state. The third GigaVUE‑FM instance has either not joined the HA group or has left the HA group. Note: You must recover the standby GigaVUE-FM instance with in 3 days (72 hours). Failure to do so will cause the instance to move out of the High Availability group.
Incomplete	One GigaVUE‑FM instance is up and running	Only one GigaVUE‑FM instance is in active state. The other two GigaVUE‑FM instances have either not joined the HA group or have left the HA group.
Standalone	One GigaVUE‑FM instance is up and running	HA is not configured on the GigaVUE‑FM instance.
Suspended	Two or more GigaVUE‑FM instances are up and running	HA is configured, but an active GigaVUE‑FM instance is yet to be elected. Note: The GigaVUE-FM high availability group can move to a Suspended state from the At Risk state when the HA group has only one active-eligible and reachable GigaVUE-FM instance. Events will not be captured for the status of the GigaVUE-FM instance that went down (which causes the HA group to move into a suspended state) as there is no active/leader instance in the High Availability group.

In case of FMHA going to suspended mode from At-Risk mode, the reachability status of node which went down (due to which HA went into suspended mode) won't be audited as writes are not accepted if there is no active/leader instance in the DB cluster

High Availability States

Fail over Mechanism

The active GigaVUE‑FM instance in the high availability group may fail at times resulting in one of the standby instances to take over and become the active instance. This process is called failover.

The following table provides the reasons for failover:

Reason for Failover	Description
Reloading the active GigaVUE‑FM instance	An active GigaVUE‑FM instance is reloaded (using the Reboot option) to bring back the HA group to healthy state again.
Planned downtime of the active GigaVUE‑FM instance	An active GigaVUE‑FM instance is brought down due to various reasons, for example to upgrade to a newer software version.
Monitoring session deployment during HA leader switchover	The new leader automatically redeploys Monitoring Sessions that failed during the switchover.
Node registration during HA leader switchover	Registration requests may temporarily fail or be delayed. Fabric nodes automatically retry registration. The new leader’s IP address is configured as the RMQ notification target on all fabric nodes.
Certificate handling or fabric deployment/upgrade during HA leader switchover	The newly elected leader resumes certificate handling and continues any in-progress fabric deployment or upgrade from the last known state. No manual intervention is required.

GigaVUE‑FM High Availability Scenarios

The High Availability page displays the current state of the GigaVUE‑FM HA group. When a failover occurs, the HA group state changes in the GUI.

The following table lists the GUI changes for the various scenarios:

Scenario

Changes in GUI

What happens to the High Availability page immediately after a failover?

The High Availability page may not update immediately or may not show all the GigaVUE‑FM instances.

Note: Refresh the page after a minute to view the new active GigaVUE‑FM instance. However, the page will get updated automatically after 5 minutes.

What happens to an active GigaVUE‑FM instance when it fails (either by itself or if failover is triggered manually)?

●

One of the standby GigaVUE‑FM instances changes to the active state. The GigaVUE‑FM instance that was initially in the active state changes to the standby state. It takes a few seconds for this transition.

●

The GigaVUE‑FM GUI of the new active instance will have all the menus and dashboards.

What happens to the GigaVUE‑FM instances that were previously in standby state?

●

One of the standby GigaVUE‑FM instances changes to the active state

●

The other standby GigaVUE‑FM instance remains in the standby state.

What happens to the embedded devices when a new Active instance takes over?

There are no changes to the devices except that they are being managed by the new active GigaVUE‑FM instance.

How do you trigger a failover?

Click on the 'Reboot' option on the current active GigaVUE‑FM instance to trigger a failover.

Troubleshoot GigaVUE‑FM High Availability Issues

Use the following table to troubleshoot issues that you might encounter while working with the HA feature.

Problem

Solution

Unable to add a license to GigaVUE‑FM HA group after a failover

Reason: Using the MAC Address in the About page to generate the license

Always use the Challenge MAC Address in the Licenses page of the active GigaVUE‑FM instance to generate licenses. Add the licenses to the HA group.

Unable to join the GigaVUE‑FM HA group after changing the password

Reason: Not logging out of GigaVUE‑FM after changing the password and before joining the HA group

Always logout of GigaVUE‑FM after changing the password and login again with the new password before joining the HA group.

Orchestrated Upgrade of GigaVUE-FM Instances in HA Group

Orchestrated upgrade of GigaVUE-FM instances in a High Availability group is similar to upgrading a standalone GigaVUE-FM instance. You can upgrade using an image that is located on an external image server, or you can use GigaVUE-FM as the image server. Refer to the Upgrade GigaVUE-FM section in the GigaVUE-FM Installation and Upgrade Guide for more details.

Prerequisites

Before upgrading the GigaVUE-FM instances in a High Availability group, ensure the following:

The High Availability group must be in a healthy state.
The latency between the GigaVUE-FM instances must be less than 100ms.
The config disk space allocated to GigaVUE-FM must have a maximum sustained transfer rate of above 100MB/s. A low disk rate impacts both file sync and installation.

Upgrading GigaVUE-FM Instances in HA Group

To upgrade the GigaVUE-FM instances in a High Availability group from the GUI, click the Upgrade option from the User icon. Always trigger the upgrade from the active GigaVUE-FM instance.

Note: Do not run any scripts (example: scripts for fetching polling statistics of maps) or queries (example: database or stats queries) prior to or during the upgrade operation as this will drain GigaVUE-FM's memory and CPU resources and impact the upgrade operation.

The following is the sequence of events that occur in the background:

Active GigaVUE-FM Instance: Software image download process is triggered and the image is downloaded.
Active GigaVUE-FM instance: Syncs and copies the downloaded image with one of the standby GigaVUE-FM instances - the first standby GigaVUE-FM instance.
First Standby Instance: Image is synced, and the standby instance will get upgraded first and rebooted.
Second Standby Instance: Image is then synced by the second standby instance and the second standby instance will get upgraded and rebooted.
Active GigaVUE-FM Instance: Once the standby instances are upgraded, the active GigaVUE-FM will start to upgrade and will be rebooted.
A new active GigaVUE-FM instance will be elected while the active GigaVUE-FM reboots.

Note: As orchestrated upgrade is a background process, device management tasks will be carried out seamlessly. The overall time consumed for the upgrade process is around 60 minutes and the total management loss time during orchestrated upgrade is around 1 minute. This is the time required to elect the new active GigaVUE-FM instance when the current active GigaVUE-FM instance reboots post upgrade.

Access GigaVUE-FM Active Instance in case of Failover

In a GigaVUE-FM high availability environment, you can perform the GigaVUE-FM and device configurations only from the active GigaVUE-FM instance. Standby instances provide minimal configuration options. In case of failover, it is important to be aware of the IP addresses of the three GigaVUE-FMs and also the DNS host names of the GigaVUE‑FM instances so that you can access the active GigaVUE‑FM instance.

To overcome this restriction and still access the GigaVUE‑FM instances, you can use one of the following options:

Use Load Balancer
Assign a DNS Name for the GigaVUE‑FM instances

Use Load Balancer

Load balancer distributes traffic across a number of servers. Integrating a load balancer with GigaVUE‑FM, forwards the traffic to the active GigaVUE‑FM instance. Load balancer performs healthcheck of GigaVUE‑FM and forwards the traffic destined to the external GigaVUE‑FM IP address and thereby to the GigaVUE-FM high availability group.

To use load balancer for forwarding the traffic, you must ensure to do the following:

Deploy the load balancer.
Configure the load balancer to access the active instance in the GigaVUE-FM High Availability group. This is accomplished using the following GET API endpoint exposed by GigaVUE-FM:

https://<FM-IP>/api/v1.3/fmHa/status

Based on the returned response codes, the load balancer can be configured in such a way that healthy servers (active instance) are those that return 200 as response code when the endpoint API is queried. The remaining servers are considered as unhealthy servers (standby instances). With these configurations, the requests are forwarded to active GigaVUE-FM always.

Sample Response for GET https://<FM-IP>/api/v1.3/fmHa/status

If FM Role == active/standalone,

return 200 with Payload {"role" : "active" } or {"role" : "standalone" }

Else if FM Role == Standby

return 202 with Payload {"role" : "Standby" }

Else if FM Role == support, suspended, unknown

return 400 with Payload for 400 {"role" : "Unknown" }

Note : 5xx can be thrown if the API Gateway on FM is not available/down

Configure the load balancer with external GigaVUE-FM IP address as the DNS IP, for example, myfm.com.

Note: If you add or remove the nodes in the GigaVUE-FM HA group, you must ensure to update the corresponding IP addresses of the nodes in the load balancer.

Assign DNS Name for the GigaVUE-FM IP

You can assign a DNS name to the three GigaVUE‑FM IP addresses, which helps to access the GigaVUE-FM instance in case of failures. Consider a DNS name, myfm.com that has the IP addresses of the three GigaVUE‑FM instances of the high availability group.

If you type the DNS name of the GigaVUE-FM, myfm.com, the DNS server returns the IP addresses of the three GigaVUE‑FM instances, and:

the Dashboard page of the active GigaVUE-FM instance appears, or
the High Availability page of the standby GigaVUE-FM instances may appear if the active instance is not reachable, as in the case of a failover. From the High Availability page of the standby instance, you can navigate to the GigaVUE‑FM active instance.

This ensures that the GigaVUE-FM High Availability group is always accessible.