Solaris IP multipathing provides the high availability and load balancing capability to the networking stack. It makes sure to avoid any single point of failure on network side. We may face issues while configuring and even after configuring IPMP. Below are some tips and tricks to troubleshoot issues in solaris IPMP configuration.
Testing IPMP failover
We can check the failure and repair of an interface very easily using if_mpadm command. “-d” detaches the interface whereas “-r” reattaches it.
# if_mpadm -d ce0 # if_mpadm -r ce0
Check if in.mpathd daemon is running
in.mpathd deamon is responsible to detect and repair IPMP failures. Check if the process is running on the system :
# ps -ef | grep mpath root 2222 1 0 20:41:10 ? 0:06 /usr/lib/inet/in.mpathd
In case its not running simply run the below command to start it :
To make in.mpathd daemon to re-read the /etc/default/mpathd configuration file after you do any changes to it use :
# pkill -HUP in.mpathd
Check the messages file
First and foremost thing to do is to check the /var/adm/messages file and look for mpathd related errors. You may find different errors ( as well as messages ) related to IPMP as shown below. The errors in the messages file can easily tell you the problem in the IPMP configuration.
1. interfaces configured for IPMP showing as "FAILED" in "ifconfig -a" output 2. "Successfully failed over from NIC xxxx to NIC xxxx", "NIC repair detected on " "Successfully failed back to NIC ", "The link has come up on ", "The link has gone down on " 3. "No test address configured on interface disabling probe-based failure detection on it" 4. "Test address address is not unique; disabling probe based failure detection on "
Check the Flags in ifconfig command output
The ifconfig -a command output displays the various flags related to IPMP and interface configuration.
1. interfaces configured for IPMP missing the "UP" and/or "RUNNING" flag in the ifconfig -a output 2. interfaces configured for IPMP showing as "FAILED" in "ifconfig -a" output
The various flags related to IPMP and their meanings are :
deprecated -> can only be used as test address for IPMP and not for any actual data transfer by applications. -failover -> does not failover when the interface fails standby -> makes the interface to be used as standby
In the case interface is not showing the RUNNING flag, Check the output of any of the below commands to ensure that you have a working link between server and switch port.
# ndd -get /dev/[interface] adv_autoneg_cap -- make sure you have set the interface first before getting the auto neg property value
# kstat -p |grep e1000g:0 |grep auto
# dladm show-dev
Ensure that the switchport is set to auto-negotiate. Disconnect and reconnect the ethernet from server side to renegotiate link speed with the switchport.
In the case interface is not showing the UP flag use :
# ifconfig [interface in group] up
Determine if the default router is properly answering ICMP probes
Probe based IPMP will use any on-link routers to send ICMP probes to and listen for responses. We can monitor the snoop command output to ensure that the onlink router is responding to the pings. The in.mpathd daemon uses test addresses to exchange ICMP probes, also called probe traffic, with other targets on the IP link. Probe traffic helps to determine the status of the interface and its NIC, including whether an interface has failed. The probes verify that the send and receive path to the interface is working correctly.
In the first window :
geeklab # snoop -d hme0 icmp Using device /dev/hme (promiscuous mode)
In the second window :
geeklab # ping 192.168.1.1 192.168.1.1 is alive
Here 192.168.1.1 is the default router. You can check the default router in the netstat -nrv output.
Now in the first window you should be able to see the traffic :
geeklab -> 192.168.1.1 ICMP Echo request (ID: 1023 Sequence number: 0) 192.168.1.1 -> geeklab ICMP Echo reply (ID: 1023 Sequence number: 0)
Here the first line is the outgoing ICMP request (the “ping”) and the second line is the ICMP reply.
If you are using probe based IPMP ( an interface marked with -failover ), then use pkill to provide a debug snapshot from in.mpathd and check for “probes lost” messages output to /var/adm/messages:
# pkill -USR1 mpathd # tail -20 /var/adm/messages
Are systems on the subnet able to respond to all-hosts multicast?
Use netstat and check for the interfaces’ membership in 220.127.116.11 :
geeklab # netstat -gn|grep 18.104.22.168 lo0 22.214.171.124 1 hme0 126.96.36.199 1
If the netstat -gn outputs show interfaces that cannot respond to ALL-SYSTEMS multicast (188.8.131.52), then add the host route using the route -p command.
Is VCS “Multi-NIC” In use with IPMP?
VCS uses a resource type called Multi-NIC to configure the IPMP using the solaris mpathd daemon. Make sure you are not using the VCS by checking /var/adm/messages file for VCS related errors.
# ps -ef|grep -i multi # grep -i LLT /var/adm/messages # grep -i GAB /var/adm/messages
If you are using VCS check the main.cf file for the configuration details and hastatus command to check if the MULTI-NIC resource is configured properly and is running fine.
Contact support with data
The last option, if everything fails is to contact the oracle support. Provide below data to oracle support for troubleshooting.
# snoop -d (first interface in the group) -o /tmp/ -s 60 -q # snoop -d (second interface in the group) -o /tmp/ -s 60 -q
Sun Explorer output :
# explorer -- the command may vary with hardware
dladm show-dev > show-dev.out dladm show-link > show-link.out dladm show-aggr -L > show-aggr.out