The Problem
Stale ISCSI target connections which are still present on Initiator node can lead to various issues like:
1. hung iscsid service which can lead to global outage.
2. flood of messages error like:
messages:Jun 3 13:58:00 server1 iscsid: iscsid: Connection5:0 to [target: iqn-name, portal: ip-here,3260] through [iface: default] is shutdown. messages:Jun 3 13:58:00 server1 iscsiadm: iscsiadm: Could not login to [iface: default, target: iqn-name, portal: ip-here,3260]. messages:Jun 3 13:58:00 server1 iscsiadm: Logging in to [iface: default, target: iqn-name, portal: ip-here,3260] (multiple) messages:Jun 7 04:24:15 server1 iscsid: iscsid: Connection2:0 to [target: iqn-name, portal: ip-here,3260] through [iface: default] is shutdown. messages:Jun 7 04:24:15 server1 iscsiadm: iscsiadm: Could not login to [iface: default, target: iqn-name, portal: ip-here,3260]. messages:Jun 7 04:24:15 server1 iscsiadm: Logging in to [iface: default, target: iqn-name, portal: ip-here,3260] (multiple)
3. booting issues (long/hung boot).
The Solution
Issue might appear after migrating between ISCSI LUNs on initiator server where old entries were removed with “iscsiadm remove” command:
# iscsiadm -m node -T [iqn] -p [ip address]:[port number] -u # iscsiadm -m node -o delete -T [iqn]
drw-------. 2 root root 30 Mar 16 09:35 iqn-good-node drw-------. 2 root root 30 Aug 12 2018 iqn-bad-node
/var/lib/iscsi/nodes/iqn-good-node:
-rw-------. 1 root root 2051 Mar 16 09:35 IP_HERE,3260
/var/lib/iscsi/nodes/iqn-bad-node:
-rw-------. 1 root root 2051 Aug 12 2018 IP_HERE,3260
Above show two ISCSI Target IQN where iqn for bad node should be removed as it produces errors in the messages file. iscsiadm command should be first used to verify that iqn-bad-node entries are no longer present on the system and no active LUNs are assigned from this IQN:
# iscsiadm -m session -P 3 iSCSI Transport Class version 2.0-870 version 6.2.0.874-10 Target: iqn-good-node(non-flash) Current Portal: IP_HERE:3260,1 Persistent Portal: IP_HERE:3260,1 ********** Interface: ********** Iface Name: default Iface Transport: tcp Iface Initiatorname: iqn-good-node Iface IPaddress: IP_HERE Iface HWaddress:Iface Netdev: SID: 1 iSCSI Connection State: LOGGED IN iSCSI Session State: LOGGED_IN Internal iscsid Session State: NO CHANGE ********* Timeouts: ********* Recovery Timeout: 6000 Target Reset Timeout: 30 LUN Reset Timeout: 30 Abort Timeout: 15 ***** CHAP: ***** username: password: ******** username_in: password_in: ******** ************************ Negotiated iSCSI params: ************************ HeaderDigest: None DataDigest: None MaxRecvDataSegmentLength: 262144 MaxXmitDataSegmentLength: 8192 FirstBurstLength: 65536 MaxBurstLength: 262144 ImmediateData: Yes InitialR2T: Yes MaxOutstandingR2T: 1 ************************ Attached SCSI devices: ************************ Host Number: 2 State: running scsi2 Channel 00 Id 0 Lun: 0 scsi2 Channel 00 Id 0 Lun: 1 Attached scsi disk sda State: running
Above list, only iqn-good-node entries and iqn-bad-node entries can be safely removed from /var/lib/iscsi/nodes folder.
Action plan would be:
1. Get downtime for reboot.
2. Remove directory:
# rm -r /var/lib/iscsi/nodes/iqn-bad-node
Above will get rid of the config file and IP + port for this IQN - so OS won't use it anymore from boot
3. Make sure that /etc/fstab file does not hold any information about bad IQN:
# cat /etc/fstab | grep -i iqn-bad-node
3. Reboot affected server.
# shutdown -r now
4. After reboot verify that /var/lib/iscsi/nodes holds only entry for:
iqn-good-node
To verify run:
# ls -la /var/lib/iscsi/nodes
5. Verify from dmesg that system no longer finds ISCSI issues:
# dmesg | grep -i iqn-bad-node
6. After no errors are detected and we no longer see pre-migration IQN - Continue service as usual.