The Problem
RDS module is not loading after rebooting the nodes of the cluster. Therefore, the CRS cannot run on any of the nodes. When trying to load the module the following errors are displayed:
# modprobe rds_rdma FATAL: Error inserting rds_rdma /lib/modules/2.6.18-274.18.1.0.1.el5/updates/net/rds/rds_rdma.ko): Unknown symbol in module, or unknown parameter (see dmesg)
dmesg output shows the following entries:
rds_rdma: Unknown symbol rds_cong_map_updated rds_rdma: Unknown symbol rds_conn_drop rds_rdma: Unknown symbol rds_message_addref rds_rdma: Unknown symbol rds_trans_unregister rds_rdma: Unknown symbol rds_info_deregister_func rds_rdma: Unknown symbol rds_send_get_message rds_rdma: Unknown symbol rds_for_each_conn_info rds_rdma: Unknown symbol rds_message_add_rdma_dest_extension rds_rdma: Unknown symbol rds_wq rds_rdma: Unknown symbol rds_atomic_send_complete rds_rdma: Unknown symbol rds_conn_connect_if_down rds_rdma: Unknown symbol rds_conn_destroy
When this issue arises. The Cluster Synchronization Services (CSS) daemon “ccsd” will not start thus not allowing GI to fully start. The following entries are logged in Cluster Synchronization Services (CSS) daemon trace file “ocssd.trc”
2017-10-25 20:13:23.776120 : SKGFD:922437376: ERROR: -8(OS Error -1 (open,sskgxplp,Invalid protocol requested (2) or protocol not loaded.,Error 0) 2017-10-25 20:13:23.776127 : SKGFD:922437376: ERROR: -10(OSS Operation oss_initialize failed with error 4 [Network initialization failed]
The Solution
The issue is caused since the line “install rds /bin/true” which appears in the /etc/modprobe.d/network.conf file works similar to a blacklist of that module, but with higher precedence.
The solution to the problem is to perform any of the following actions, which the main objective is to get rid of the “install rds /bin/true” so that the module can load after every system reboot.
1. Remove the file /etc/modprobe.d/network.conf or move it to another directory like /tmp.
or
2. Commenting out the line in the /etc/modprobe.d/network.conf like in the example bellow
# install rds /bin/true
Then we can proceed just to reboot the system and make sure rds is loaded after reboot, if rds is not loaded then load the module by running:
# modprobe rds_rdma
or
Run the following commands:
# depmod -ae current_kernel_version_running -------> for example 2.6.18-274.18.1.0.1.el5 # modprobe rds_rdma # reboot
Once rds module is properly loaded CRS can be started on all nodes of the cluster.