Manage Not-reachable Nodes in Cluster

GigaVUE-FM allows you to add and manage standalone nodes and clusters as described in Manage GigaVUE® Nodes and Clusters. Occasionally, the nodes in a cluster may become unreachable due to various reasons.

Until software version 6.0.00, when a node in a cluster managed by GigaVUE-FM is removed from the cluster using the no cluster enable CLI command, or when any node becomes unreachable to the cluster leader due to other reasons such as power outage, the cluster leader marks the operational status of the node as 'Down' in CLI. After the next config sync cycle, GigaVUE-FM does not manage the node anymore. An event is triggered and corresponding alarms are raised. When a node in a cluster managed by GigaVUE-FM is removed using the GigaVUE-FM GUI (Edit Cluster page), GigaVUE-FM manages the node as a standalone node. The Operational Status of the node is marked as 'Down' in CLI.

When a cluster with one of the nodes with operational status 'Down' is added to GigaVUE-FM, GigaVUE-FM removes the node from the cluster and no longer manages the node. If the node comes up, the node is added to GigaVUE-FM, only if the node is not already managed by GigaVUE-FM. Otherwise, you must manually:

  • Add the node as a stand-alone node in GigaVUE-FM
  • Add the node to the cluster

Therefore, it is important to keep track of the events and alarms to know about the nodes that have been removed from the cluster, especially in large deployment scenarios.

Starting from software version 6.1.00, when a node become unreachable, the nodes are designated with the following operational statuses (depending on the reason):

Reasons for node becoming unreachable Operational Status

Nodes in cluster is removed intentionally from the cluster:

Using the no cluster enable CLI command.
Using GigaVUE-FM GUI (by navigating to the Edit Cluster page)
Left

Nodes is offline due to issues such as power outage, management interface being down, or crash.

Not-reachable

GigaVUE-FM manages the not-reachable member nodes as part of the cluster and does not remove the nodes from the cluster, that is, GigaVUE-FM manages the not-reachable nodes in faulty/offline state. This allows easy management of the not-reachable nodes. GigaVUE-FM displays clusters with not-reachable nodes in the Physical Nodes page and in the Chassis page.

To view only clusters with not-reachable node from the Physical Nodes page:

  1. Navigate to the Physical Nodes page.
  2. Click the Filter button.
  3. Scroll down and enable the following toggle option to view only the clusters with not-reachable nodes:
  4. Show only the clusters with not-reachable nodes

    Refer to the following notes:

  • If the node is already managed by GigaVUE-FM, then the Host Name and IP Address of the not-reachable nodes is displayed in the GUI.
  • If you add a cluster with not-reachable node to GigaVUE-FM, then the IP address and Host Name of the not-reachable node will be displayed in the following format: clusterid_boxid. The IP address of the not-reachable node will get updated once the node becomes reachable. Role of the not-reachable node will be 'Unknown'. Software Version and other properties will be displayed as NA.

Rules, Notes, and Limitations

Refer to the following rules, notes and limitations:

  • You can only remove the not-reachable member node from the cluster. You cannot perform any other write operation on the not-reachable node.
  • You can configure Email notification (instant or digest) for the events triggered for the not-reachable nodes. Refer to the Configure Email Notifications section in the GigaVUE Administration Guide for details.
  • The not-reachable member nodes are captured in the physical and Fabric Health Analytics dashboards.
  • You can import and export clusters with not-reachable member nodes.
  • You must perform a factory reset of the not-reachable nodes when:
    • removing a not-reachable node from a cluster and adding it to a new cluster
    • removing a not-reachable node from a cluster and adding it as a leader node of another cluster
  • When rebooting a cluster with not-reachable nodes, GigaVUE-FM prompts you to skip the not-reachable nodes. Reboot will not proceed until you skip the not-reachable nodes.

Remove Offline Chassis

GigaVUE-FM allows you to remove the not-reachable nodes from the cluster. To remove the non-reachable nodes:

  1. From the Physical Node page, select the required cluster from which you want to remove the not-reachable nodes.
  2. Select Actions > Edit Cluster. The Edit Cluster - Canvas appears.
  3. Right click on the not-reachable node, and click Delete.

To perform this action:

  • You must be a read-write user with Infrastructure Management category.
  • You must remove the associated ports and map configuration of the not-reachable member nodes before removing the nodes from the cluster. Without doing this, GigaVUE-FM does not allow you to remove the nodes.

Alarms and Health Status of Not-reachable Nodes

The following alarms are triggered in the Alarms page depending on the status of the not-reachable nodes:

State Alarm
Member node in a cluster is not-reachable to the cluster leader. The alarm is cleared when the member node is reachable. Cluster Member Not-Reachable
Member node is reachable by the cluster and added to the same cluster Cluster Member Online

When a not-reachable node becomes part of another cluster:

  • The config-sync operation fails.
  • The cluster to which the node originally belonged to displays the member node as not-reachable.
  • The health status of the new cluster becomes red.

You are responsible for removing the not-reachable member node from the cluster to which it originally belonged to only after which the config sync operation will succeed.

Upgrade Cluster with Not-reachable Nodes

When upgrading a cluster from software version 6.0.00 to 6.1.00:

Member nodes with Operational Status as Down: Cluster upgrade is successful in both CLI and GigaVUE-FM. After the upgrade, the operational status of the member nodes is left.

While upgrading a cluster from software version 6.1.00 to higher, if the cluster contains:

  • Member nodes with Operational Status as Not-reachable: Cluster upgrade will not proceed until the not-reachable nodes are skipped. Refer to the following table for the various scenarios:
  • Type of Upgrade Cluster State Upgrade Status
    Immediate Upgrade All nodes in the cluster are reachable Upgrade will succeed.
      One or more nodes in the cluster are not-reachable Enable the "Skip not-reachable nodes to upgrade" check-box for the upgrade to succeed.
    Scheduled Upgrade All nodes in the cluster are reachable. Upgrade will succeed.
      All nodes in the cluster are reachable. However, it is possible for the nodes to go to not-reachable state at the scheduled upgrade time. Enable the "Skip not-reachable nodes to upgrade" check-box for the upgrade to succeed.
      One or more nodes in the cluster are not-reachable Enable the "Skip not-reachable nodes to upgrade" check-box for the upgrade to succeed.

Note:  After the cluster upgrade operation is completed, you must ensure to upgrade the not-reachable nodes to the required software version. Failure to do so will result in version mismatch conflict.

For GigaSMART Signature image upgrade, you must skip the not-reachable nodes for the upgrade to succeed.

Type of Upgrade Cluster State Upgrade Status
Immediate Upgrade All nodes in the cluster are reachable Upgrade will succeed.
Scheduled Upgrade One or more nodes in the cluster are not-reachable Enable the "Skip not-reachable nodes to upgrade" check-box for the upgrade to succeed.

Backup and Restore of Cluster with Not-reachable Nodes

During GigaVUE-FM backup and restore operation, if not-reachable nodes are part of the cluster:

  • The not-reachable member nodes are also backedup. On restoring the backup file, the not-reachable member node will be restored to its original state.
  • During the next config sync cycle after restore, GigaVUE-FM updates the status of the not-reachable member nodes and the cluster.

For cluster backup and restore operation, refer to the following table:

Operational State During backup Operational State During Restore Final Operational State
All nodes are in 'up' state Few nodes are in 'left' or 'not-reachable' state Operational status of the nodes in 'left' or 'not-reachable' state will remain as 'left' and 'not-reachable', respectively.
Few nodes are removed using no chassis box id command Operational status of the removed nodes is configured as 'left'.
Few nodes are in 'left' or 'not-reachable' state Same state No change.
Few nodes are removed using no chassis box id command Operational status of the removed nodes is configured as 'left'.
Some nodes come up Operational status of the nodes will show as up.