Troubleshoot Cluster Issues

This section lists the issues that may occur when you create clusters, and the steps to troubleshoot such issues. Refer to the following sections for details:

■   Unable to Add a Box ID to a Cluster

Unable to Add a Box ID to a Cluster

Problem Description: When you remove a device from a cluster, the box ID of the device will still be stored in the running configuration of the master device. If you assign the same box ID to another device and add the device to the cluster, GigaVUE-FM will not accept the box ID and the following error message is displayed:

Error Message in GigaVUE-FM

Box-id [<unusable-boxIds>] occupied by offline chassis, choose different box-id(s)

Corrective Action: You can choose to either assign a different box ID or perform the following tasks to add the same box ID to the cluster:

1.   Run the following command to remove the box ID from the master device:

(config) # no chassis box-id <node_boxid_to_be_removed>

2. Complete the following steps to rediscover the master device in GigaVUE-FM:
a. Navigate to Physical > Physical Nodes.
b. Select the check box of the master device or cluster ID that you want to rediscover.
c. From the Actions drop-down list, select Rediscover.

The box-ID is removed from the running configuration of the master device. You can now add the same box ID to the cluster.

Unable to Upgrade Nodes in a Cluster

Problem Description: When upgrading the nodes in a cluster, some nodes get upgraded successfully while some nodes do not get upgraded.

Corrective Action: Downgrade all the successfully upgraded nodes back to the previous software version and restart the upgrade.

IMPORTANT: It is highly recommended to not downgrade GigaVUE nodes from higher software version to lower software version.

Use the steps listed below on the specific node that is to be downgraded:

1.   Use the reset factory all command to reset the node.
2. Replace the image in the next partition with the image of the required version to be downgraded. This prevents the script file from fetching the template file from the next version.

When downgrading a node from software version >= 6.0.00 to software version < 6.0.00: After resetting the node, the node can occasionally get in to in a continuous reboot loop. If the node gets stuck in a continuous reboot loop:

  1. Modify the kernel param by pressing Esc key while the node boots up.
  2. Press e to edit the kernel param.
  3. Modify the rw option to ro.

For example:

kernel /vmlinuz rw root=/dev/sda5 img_id=1 loglevel=3 panic=10 pci=noaer mem=34668M memmap=2176M$1920M console=tty0 console=ttyS0,115200n8 pcie_aspm=off

to

kernel /vmlinuz ro root=/dev/sda5 img_id=1 loglevel=3 panic=10 pci=noaer mem=34668M memmap=2176M$1920M console=tty0 console=ttyS0,115200n8 pcie_aspm=off