Main Menu

Search

Showing posts with label symbolerrors. Show all posts
Showing posts with label symbolerrors. Show all posts

EXALOGIC: INFINIBAND: Command to Check Symbol Errors on IB Ports on Compute Nodes (How To Doc)

Below command can be used on Exalogic Compute nodes to check for Symbol Errors on IB Ports on Compute Nodes.
cat /sys/class/infiniband/mlx4_0/ports/*/counters/symbol_error
Below is example snippet of above command. In this case there are zero symbol errors on both the IB ports.
# cat /sys/class/infiniband/mlx4_0/ports/*/counters/symbol_error 
0 
0 

Products to which Article Applies
Exalogic racks using Infiniband Switches
Article Author: Tarun Boyella

EXALOGIC: INFINIBAND: "perfquery" command to check Symbol Errors and Other details of Compute Nodes Ports (How To Doc)

Following perfquery command can be executed on Compute node to check the Symbol Errors and Other details of Compute Nodes Ports.
perfquery <Baselid> <Port>
We can get Base lid and Port details from "ibstat" command of the Compute Node. Below is example snippet of "ibstat" command.
CA 'mlx4_0' 
CA type: MT26428 
Number of ports: 2 
Firmware version: 2.11.2010 
Hardware version: b0 
Node GUID: XXXXXXXXXXXXX 
System image GUID: XXXXXXXXXXXXX 

Port 1: 
State: Active 
Physical state: LinkUp 
Rate: 40 
Base lid: 7 
LMC: 0 
SM lid: 41 
Capability mask: 0x02510868 
Port GUID: XXXXXXXXXXXXX 
Link layer: IB 

Port 2: 
State: Active 
Physical state: LinkUp 
Rate: 40 
Base lid: 9 
LMC: 0 
]SM lid: 41 
Capability mask: 0x02510868 
Port GUID: XXXXXXXXXXXXX 
Link layer: IB 
From above output we can see Base lid for Ports 1 and 2 of Compute Nodes. Now if we want to run perfquery on Port 1 on lid 7 command will look as follows:

perfquery 7 1
Below is example snippet of above command. If we notice Symbol Errors increasing in above command output, then it indicates that there is an issue with Compute node IB Port (bad cable, loose cable or other issues).
 # Port counters: Lid 7 port 1 (CapMask: 0x1400) 
PortSelect:......................1 
CounterSelect:...................0x0000 
SymbolErrorCounter:..............0 
LinkErrorRecoveryCounter:........0 
LinkDownedCounter:...............0 
PortRcvErrors:...................0 
PortRcvRemotePhysicalErrors:.....0 
PortRcvSwitchRelayErrors:........0 
PortXmitDiscards:................2 
PortXmitConstraintErrors:........0 
PortRcvConstraintErrors:.........0 
CounterSelect2:..................0x00 
LocalLinkIntegrityErrors:........0 
ExcessiveBufferOverrunErrors:....0 
VL15Dropped:.....................0 
PortXmitData:....................4294967295 
PortRcvData:.....................4294967295 
PortXmitPkts:....................4294967295 
PortRcvPkts:.....................1088593887 
PortXmitWait:....................65418428 

Products to which Article Applies
Exalogic racks using Infiniband Switches

Article Author: Tarun Boyella

INFINIBAND: "getportcounters" Command to check if there are any Symbol Errors, Link Issues on Switch Ports of IB Switch.

Below "getportcounters" Command can be used to check if there are any Symbol Errors, Link Issues on Switch Ports of IB Switch.
getportcounters <Switch Port>
For example if you want to check the counters on port 6 above command looks as follows:
getportcounters 6
Below is sample output of above command. If we see SymbolErrors increasing then that indicates there is an issue with the Switch port (loose cable, bad cable, bad switch port or something else)
SymbolErrors.....................0 
LinkRecovers.....................0 
LinkDowned.......................0 
RcvErrors........................0 
RcvRemotePhysErrors..............0 
RcvSwRelayErrors.................297 
XmtDiscards......................6 
XmtConstraintErrors..............0 
RcvConstraintErrors..............0 
LinkIntegrityErrors..............0 
ExcBufOverrunErrors..............0 
VL15Dropped......................0 
XmtData..........................4294967295 
RcvData..........................4294967295 
XmtPkts..........................717430186 
RcvPkts..........................634187063 
XmtWait..........................1338332 

Products to which Article Applies
Infiniband Switches

Article Author: Tarun Boyella

EXALOGIC: INFINIBAND: "getportcounters" Command to Check If there are Issues/Problems and Errors On 40Gbits/sec Internal Connector Ports of IB Switch (How To Doc)

INFINIBAND: "getportcounters" Command to Check If there are Issues/Problems and Errors On 40Gbits/sec Internal Connector Ports of IB Switch (How To Doc)

Below is getportcounters command which can be run on specific 40Gbits/sec IB Switch ports to see if there are any Symbol errors
# getportcounters <internal connector port>

In above command replace internal connector port with the connector port for which you want to check errors. Switch 40Gits/sec connector ports range is from 0A – 15A and 0B – 15B.

We should not be seeing any increase in SymbolErrors gradually or significantly over a period of time when running this command. If there is increase in SymbolErrors then it indicates that there is some issues with connector ports on the IB Switch. It may be lose cabling, bad cable, bad switch port or something else.

As an example if you want check errors on 13B, your command and sample output will look as follows:

# getportcounters 13B
Port counters for connector 13B Switch port 10
SymbolErrors.....................0
LinkRecovers.....................20
LinkDowned.......................0
RcvErrors........................0
RcvRemotePhysErrors..............0
RcvSwRelayErrors.................15
XmtDiscards......................23
XmtConstraintErrors..............0
RcvConstraintErrors..............0
LinkIntegrityErrors..............0
ExcBufOverrunErrors..............0
VL15Dropped......................0
XmtData..........................4294967295
RcvData..........................566477715
XmtPkts..........................130024470
RcvPkts..........................9198929
XmtWait..........................870


Products to which Article Applies

Infiniband Switches.


Additional References

https://docs.oracle.com/cd/E19671-01/835-0793-03/z40000471973870.html


INFINIBAND: "getportcounters" Command to Check If there are Issues/Problems and Errors On 10Gbits/sec ETH Ethernet Connector Ports of IB Switch (How To Doc)

Below are getportcounters commands which can be run on 0A-ETH and 1A-ETH Ethernet Connector ports of IB Switch to check if there are an RX, TX and CRC errors
# getportcounters 0a-eth | egrep -i "error|CRC"
# getportcounters 1a-eth | egrep -i "error|CRC"

Below is example snippet of above command.
# getportcounters 0a-eth | egrep -i "error|CRC"
RX CRC...........................0
RX errors........................0
TX errors........................0
RX CRC...........................0
RX errors........................0
TX errors........................0
RX CRC...........................0
RX errors........................0
TX errors........................0
RX CRC...........................0
RX errors........................0
TX errors........................0

If we see any gradual or significant increase of RX/TX errors and RX CRC errors over a period of time, then it indicates that there is some issues with 0A/1A ETH connector ports on the IB Switch. It may be lose cabling, bad cable, bad switch port or something else.



Products to which Article Applies

Infiniband Switches.

Additional References

https://docs.oracle.com/cd/E19671-01/835-0793-03/z40000471973870.html