I was doing an ONTAP upgrade of a 4-node MetroCluster IP. I was upgrading from 9.7 to 9.10, and the Cisco backend switches was running 7.x with RFC 1.4, so they needed to be upgraded to 9.3(9) with RFC 1.6 and it was when applying the RFC 1.6 that I ran into an issue.
The guide from NetApp dictates that you need to reset the switch to factory settings, clearing all setup, including remote access. So, console connection is mandatory when doing it the official way. But I was not keen to spend that much time in the two datacenters just to upgrade a switch and Cisco says can be done remotely.
The issue with applying the RFC was that it did not add the new VLAN created for local cluster traffic to the trunk connecting the switches in the same DC.
In RFC 1.4 local cluster traffic is on vlan1, and in RFC 1.6 it is on vlan 101 (for site A) and vlan 201 (for site B).
Port In1/1 and In1/2 is access ports and vlan was changed to 101 without problem. But the Port-Channel Po1(uplink) to the other switch was not allowing vlan 101. This was because the Port-Channel was already formed from the old RFC so the changes to it when applying RFC 1.6 was ignored, that means that the tagging of vlan 101 was ignored when applying RFC 1.6
The consequence of this was that half the cluster paths (as seen by cluster ping-cluster) was down. This was first noticed while the upgrade of the second switch in site A as the cluster lost quorum, and several of the cluster services went offline.
motd from RFC 1.4
| ***********************************************************************************
| * NetApp Reference Configuration File (RCF)
| *
| * Switch : NX3132Q-V
| * Filename : 1.4-MetroCluster-IP-B1
| * Date : 27-Jun-2018
| *
| * Port Usage:
| * Ports 1- 6: Intra-Cluster Node Ports
| * Ports 7- 8: Intra-Cluster ISL Ports
| * Ports 9-14: MetroCluster-IP Node Ports, VLAN 10
| * Ports 15-20: MetroCluster-IP ISL Ports, VLAN 10, Port Channel 10, native mo
| * Ports 21-24: MetroCluster-IP ISL Ports, VLAN 10, Port Channel 11, breakout
| *
| * hardware profile portmode 24x40g
| *
| ***********************************************************************************
motd from RFC 1.6
*************************************************************************************
* NetApp Reference Configuration File (RCF)
*
* Switch : NX3132Q-V (direct storage, L2 Networks, direct ISL)
* Filename : NX3132_v1.90_B1.txt
* Date : Generator: v1.4a_2022-07-18_001, file creation: 2022-09-30, 14:26:14
*
* Platforms : 1: MetroCluster 1 : FAS9000, AFF-A700, AFF-A800
*
* Port Usage:
* Ports 1- 2: Intra-Cluster Node Ports, Cluster: MetroCluster 1, VLAN 201
* Ports 3- 4: Ports not used
* Ports 5- 6: Ports not used
* Ports 7- 8: Intra-Cluster ISL Ports, local cluster, VLAN 201
* Ports 9-10: MetroCluster 1, Node Ports, VLAN 10
* Ports 11-12: Ports not used
* Ports 13-14: Ports not used
* Ports 15-20: MetroCluster-IP ISL Ports, VLAN 10, Port Channel 10
* Ports 21-24: MetroCluster-IP ISL Ports, VLAN 10, Port Channel 11, breakout mode 10gx4
* Ports 25-32: Disabled
*
*************************************************************************************
Here is the output from the switch when applying the RFC
mcc-sw-B1# copy bootflash:NX3132_v1.90_B1.txt running-config
Please copy running-config to startup-config and reload the switch to apply changes.
Error: Can't disable/re-enable ssh:Current user is logged in through ssh
Warning: Please save config and reload the system for the configuration to take effect
Warning: SPAN source vlan, filter vlan, source NS interface and source forward-drops features and sFlow data source on NS interface will NOT be supported after this configuration change.
Warning: sFlow data source on NS interface will NOT be supported after this configuration change.
Warning: vpc convergence region is required for vxlan feature functionality in a VPC environment. Vxlan feature in VPC will not be supported after this configuration change.
Warning: Please save config and reload the system for the configuration to take effect
ERROR: switchport trunk allowed vlan 201
ERROR: : port already in a port-channel, no config allowed
Copy complete, now saving to disk (please wait)...
Copy complete.
We get three errors. One that ssh cannot be reconfigured. That is fine.
And then two errors about applying vlan201 on a trunk and changing config of a member in a port-channel. This is NOT fine. But as to not edit directly in startup-config I saved the new running-config and reloaded.
mcc-sw-B1# copy running-config startup-config
[# ] 1%
[# ] 2%
[## ] 3%
[## ] 4%
[### ] 5%
[### ] 6%
[### ] 7%
[#### ] 8%
[#### ] 9%
[##### ] 10%
[##### ] 11%
[##### ] 12%
[###### ] 13%
[###### ] 14%
[####### ] 15%
[####### ] 16%
[####### ] 17%
[######## ] 18%
[######## ] 19%
[######### ] 20%
[######### ] 21%
[######### ] 22%
[########## ] 23%
[########## ] 24%
[########### ] 25%
[########### ] 26%
[########### ] 27%
[############ ] 28%
[############ ] 29%
[############# ] 30%
[############# ] 31%
[############# ] 32%
[############## ] 33%
[############## ] 34%
[############### ] 35%
[############### ] 36%
[############### ] 37%
[################ ] 38%
[################ ] 39%
[################# ] 40%
[################# ] 41%
[################# ] 42%
[################## ] 43%
[################## ] 44%
[################### ] 45%
[################### ] 46%
[################### ] 47%
[#################### ] 48%
[##################### ] 50%
[##################### ] 51%
[##################### ] 52%
[###################### ] 53%
[###################### ] 54%
[####################### ] 55%
[####################### ] 56%
[####################### ] 57%
[######################## ] 58%
[######################## ] 59%
[######################### ] 60%
[######################### ] 61%
[######################### ] 62%
[########################## ] 63%
[########################## ] 64%
[########################### ] 65%
[########################### ] 66%
[########################### ] 67%
[############################ ] 68%
[############################ ] 69%
[############################# ] 70%
[############################# ] 71%
[############################# ] 72%
[############################## ] 73%
[############################## ] 74%
[############################### ] 75%
[############################### ] 76%
[############################### ] 77%
[################################ ] 78%
[################################ ] 79%
[################################# ] 80%
[################################# ] 81%
[################################# ] 82%
[################################## ] 83%
[################################## ] 84%
[################################### ] 85%
[################################### ] 86%
[################################### ] 87%
[#################################### ] 88%
[#################################### ] 89%
[##################################### ] 90%
[##################################### ] 91%
[##################################### ] 92%
[###################################### ] 93%
[###################################### ] 94%
[####################################### ] 95%
[####################################### ] 96%
[####################################### ] 97%
[########################################] 98%
[########################################] 100%
Copy complete, now saving to disk (please wait)...
Copy complete.
I did a reload to activate all the changes, and then checked the vlans to see where it had been allowed.
mcc-sw-B1# show vlan
VLAN Name Status Ports
---- -------------------------------- --------- -------------------------------
1 default active Po1, Eth1/3, Eth1/4, Eth1/5
Eth1/6, Eth1/7, Eth1/8
10 VLAN0010 active Po10, Po11, Eth1/9, Eth1/10
Eth1/11, Eth1/12, Eth1/13
Eth1/14, Eth1/15, Eth1/16
Eth1/17, Eth1/18, Eth1/19
Eth1/20, Eth1/21/1, Eth1/21/2
Eth1/21/3, Eth1/21/4, Eth1/22/1
Eth1/22/2, Eth1/22/3, Eth1/22/4
Eth1/23/1, Eth1/23/2, Eth1/23/3
Eth1/23/4, Eth1/24/1, Eth1/24/2
Eth1/24/3, Eth1/24/4
201 VLAN0201 active Eth1/1, Eth1/2
VLAN Type Vlan-mode
---- ----- ----------
1 enet CE
10 enet CE
201 enet CE
Remote SPAN VLANs
-------------------------------------------------------------------------------
Primary Secondary Type Ports
------- --------- --------------- ------------------------------------------
At this time both switches in site B had been upgraded, and the NetApp Cluster was showing a few errors
mcc_B::*> cluster ring show
Node UnitName Epoch DB Epoch DB Trnxs Master Online
--------- -------- -------- -------- -------- --------- ---------
mcc_B-01
mgmt 20 20 1 mcc_B-01
master
mcc_B-01
vldb 18 18 1 mcc_B-01
master
mcc_B-01
vifmgr 0 21 2 - offline
mcc_B-01
bcomd 15 15 1 mcc_B-01
master
mcc_B-01
crs 14 14 15 mcc_B-02
secondary
mcc_B-02
mgmt 20 20 1 mcc_B-01
secondary
mcc_B-02
vldb 0 18 1 - offline
mcc_B-02
vifmgr 0 21 1 - offline
mcc_B-02
bcomd 15 15 1 mcc_B-01
secondary
mcc_B-02
crs 14 14 15 mcc_B-02
master
10 entries were displayed.
mcc_B::*> cluster ping-cluster -node mcc_B-01
Host is mcc_B-01
Getting addresses from network interface table...
Getting addresses from sitelist...
Local = xxx.xxx.xxx.xxx xxx.xxx.xxx.xxx
Remote = xxx.xxx.xxx.xxx xxx.xxx.xxx.xxx
Cluster Vserver Id = xxxxxxxxxx
Ping status:
....
Basic connectivity succeeds on 2 path(s)
Basic connectivity fails on 2 path(s)
Local xxx.xxx.xxx.xxx to Remote xxx.xxx.xxx.xxx
Local xxx.xxx.xxx.xxx to Remote xxx.xxx.xxx.xxx
................
Detected 9000 byte MTU on 2 path(s):
Local xxx.xxx.xxx.xxx to Remote xxx.xxx.xxx.xxx
Local xxx.xxx.xxx.xxx to Remote xxx.xxx.xxx.xxx
Larger than PMTU communication succeeds on 2 path(s)
RPC status:
2 paths up, 0 paths down (tcp check)
2 paths up, 0 paths down (udp check)
To fix it, I allowed vlan 201 on Po1
mcc-sw-B1# conf t
Enter configuration commands, one per line. End with CNTL/Z.
mcc-sw-B1(config)# interface Po1
mcc-sw-B1(config-if)# switchport trunk allowed vlan add 201
mcc-sw-B1(config-if)# exit
mcc-sw-B1(config)# exit
mcc-sw-B1# show vlan
VLAN Name Status Ports
---- -------------------------------- --------- -------------------------------
1 default active Po1, Eth1/3, Eth1/4, Eth1/5
Eth1/6, Eth1/7, Eth1/8
10 VLAN0010 active Po10, Po11, Eth1/9, Eth1/10
Eth1/11, Eth1/12, Eth1/13
Eth1/14, Eth1/15, Eth1/16
Eth1/17, Eth1/18, Eth1/19
Eth1/20, Eth1/21/1, Eth1/21/2
Eth1/21/3, Eth1/21/4, Eth1/22/1
Eth1/22/2, Eth1/22/3, Eth1/22/4
Eth1/23/1, Eth1/23/2, Eth1/23/3
Eth1/23/4, Eth1/24/1, Eth1/24/2
Eth1/24/3, Eth1/24/4
201 VLAN0201 active Po1, Eth1/1, Eth1/2, Eth1/7
Eth1/8
VLAN Type Vlan-mode
---- ----- ----------
1 enet CE
10 enet CE
201 enet CE
Remote SPAN VLANs
-------------------------------------------------------------------------------
Primary Secondary Type Ports
------- --------- --------------- ------------------------------------------
Now the NetApp cluster was happy again
mcc_B::*> cluster ring show
Node UnitName Epoch DB Epoch DB Trnxs Master Online
--------- -------- -------- -------- -------- --------- ---------
mcc-B-01
mgmt 52 52 874 mcc-B-01
master
mcc-B-01
vldb 44 44 6 mcc-B-01
master
mcc-B-01
vifmgr 93 93 72 mcc-B-01
master
mcc-B-01
bcomd 15 15 1 mcc-B-01
master
mcc-B-01
crs 14 14 15 mcc-B-02
secondary
mcc-B-02
mgmt 52 52 874 mcc-B-01
secondary
mcc-B-02
vldb 44 44 6 mcc-B-01
secondary
mcc-B-02
vifmgr 93 93 72 mcc-B-01
secondary
mcc-B-02
bcomd 15 15 1 mcc-B-01
secondary
mcc-B-02
crs 14 14 15 mcc-B-02
master
10 entries were displayed.
mcc-B::*> cluster ping-cluster -node mcc-B-01
Host is mcc-B-02
Getting addresses from network interface table...
Cluster mcc-B-01_clus1 xxx.xxx.xxx.xxx mcc-B-01 e4a
Cluster mcc-B-01_clus2 xxx.xxx.xxx.xxx mcc-B-01 e4e
Cluster mcc-B-02_clus1 xxx.xxx.xxx.xxx mcc-B-02 e4a
Cluster mcc-B-02_clus2 xxx.xxx.xxx.xxx mcc-B-02 e4e
Local = xxx.xxx.xxx.xxx xxx.xxx.xxx.xxx
Remote = xxx.xxx.xxx.xxx xxx.xxx.xxx.xxx
Cluster Vserver Id = xxxxxxxxxx
Ping status:
....
Basic connectivity succeeds on 4 path(s)
Basic connectivity fails on 0 path(s)
................
Detected 9000 byte MTU on 4 path(s):
Local xxx.xxx.xxx.xxx to Remote xxx.xxx.xxx.xxx
Local xxx.xxx.xxx.xxx to Remote xxx.xxx.xxx.xxx
Local xxx.xxx.xxx.xxx to Remote xxx.xxx.xxx.xxx
Local xxx.xxx.xxx.xxx to Remote xxx.xxx.xxx.xxx
Larger than PMTU communication succeeds on 4 path(s)
RPC status:
2 paths up, 0 paths down (tcp check)
2 paths up, 0 paths down (udp check)