Most of the LDOM issues can be easily resolved by looking at the error message. Below are some of the most commonly faced LDOM issues and basic troubleshooting tips to resolve them.
1. Unable to create LDom 2. Unable to install LDom 3. Unable to configure LDom 4. Unable to bind LDom 5. Unable to connect to LDom
How to verify if LDom services are started
Below are two basic services needs to be running on the Primary domain (aka Control domain) for the ldom to start/work properly.
1. svc:/ldoms/ldmd:default 2. svc:/ldoms/vntsd:default
Here, ldmd is ldom service and vntsd is the virtual console service. The state of the service should be online, rather than disabled/maintenance. If the State of the service is disabled we can try restarting it using svcadm command. If the state of the service is in maintenance check the below logs for any abnormalities :
/var/svc/log/ldoms-ldmd:default.log for the ldm logs
/var/svc/log/ldoms-vntsd:default.log for the vntsd logs
Also check /var/adm/messages file for any error logs related to LDOMs. Use the ps command to check the process list whether the ldmd and vntsd is there in the process list or not.
# ps -ef | egrep "vntsd|ldmd"
Below is the default configuration of the ldmd service and the vntsd service :
svc:> select ldoms/ldmd svc:/ldoms/ldmd> listprop ldmd application ldmd/debug integer 0 ldmd/hops integer 0 ldmd/nocfg boolean false ldmd/xmpp boolean false fmd_config application fmd_config/fmd_to_ldmd_init_timeout integer 20 fmd_config/fmd_to_ldmd_running_timeout integer 10 filesystem dependency filesystem/entities fmri svc:/system/filesystem/local filesystem/grouping astring require_all filesystem/restart_on astring none filesystem/type astring service general framework general/action_authorization astring solaris.smf.manage.ldoms general/entity_stability astring Unstable general/single_instance boolean true start method start/exec astring /opt/SUNWldm/bin/ldmd_start start/timeout_seconds count 120 start/type astring method stop method stop/exec astring :kill stop/timeout_seconds count 60 stop/type astring method tm_common_name template tm_common_name/C ustring "Logical Domain Manager"
svc:> select ldoms/vntsd svc:/ldoms/vntsd> listprop vntsd application vntsd/listen_addr astring localhost vntsd/timeout_minutes integer 0 vntsd/vcc_device astring virtual-console-concentrator@0 network dependency network/entities fmri svc:/milestone/network network/grouping astring optional_all network/restart_on astring error network/type astring service syslog dependency syslog/entities fmri svc:/system/system-log syslog/grouping astring optional_all syslog/restart_on astring none syslog/type astring service general framework general/entity_stability astring Unstable start method start/exec astring /lib/svc/method/svc-vntsd start/timeout_seco
If unable to telnet to a guest ldom from control domain, one may need to stop/restart vntsd service:
# svcadm disable /ldoms/vntsd # svcadm enable /ldoms/vntsd
LDOM console logs
Starting from LDom 3.0, console log is collected at /var/log/vntsd/*/* ( minimum Solaris 11 on primary domain ).
# ldm list NAME STATE FLAGS CONS VCPU MEMORY UTIL NORM UPTIME primary active -n-cv- UART 8 24G 0.2% 0.2% 12d 11h 1m ldom01 active -n---- 4001 10 2304M 0.1% 0.1% 12d 7h 22m // console logs on live system: # ls -l /var/log/vntsd/*/* -rw------- 1 root root 2812386 Sep 18 10:38 /var/log/vntsd/ldom01/console-log
Ensure proper binding of resources
In order for resources to be assigned to the LDOM domains, they need to be bound to the appropriate domain. This process binds the virtual resources to actual physical resources available on the system. If there is not enough physical resource to be allocated, failure will be reported during the binding process.
One or more of the following errors are observed when one have issue with binding resources:
1. Insufficient VCPUS resources to bind LDom.
2. Not enough free memory present to meet this request.
3. Could not bind requested memory for LDom.
4. Only [number] physical crypto unit resource(s) available to bind to LDom.
5. Didn’t find a suitable vcc service in a bound service domain to bind guest [guestname] console.
Below is a newly created LDom guest named guest01, with resources added, but not being bound yet. Notice that the LDC fields are blank, and there are no actual allocation of VCPUs and memory. The STATE of the guest is noted as inactive.
# ldm list-bindings -e guest01 NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME guest01 inactive ------ 4 3G UUID 1105fe75-b5c6-6447-929c-a8dba81db847 MAC 00:14:4f:fa:2e:70 CONTROL failure-policy=ignore extended-mapin-space=off cpu-arch=native rc-add-policy= shutdown-group=15 DEPENDENCY master= CONSTRAINT threading=max-throughput VARIABLES auto-boot?=false boot-device=/virtual-devices@100/channel-devices@200/disk@0:a disk net pm_boot_policy=disabled=1;ttfc=0;ttmr=0; NETWORK NAME SERVICE ID DEVICE MAC MODE PVID VID MTU MAXBW LINKPROP vnet0 primary-vsw0 0 00:14:4f:fb:c2:d0 1 phys-state vnet1 sec1-vsw0 1 00:14:4f:fa:95:92 1 phys-state DISK NAME VOLUME TOUT ID DEVICE SERVER MPGROUP guest01-root guest01-root@primary-vds0 0 guest01-data1 guest01-data1@primary-vds0 1 VLDCC NAME SERVICE DESC LDC ds primary-vldc0@primary domain-services
Once the guest is bound, the actual physical resources are bound, and LDCs(Logical Domain Channels) are assigned. The domain STATE will be shown as ‘bound’.
Memory binding failures
Memory binding failure is reported when there is no additional unallocated memory available for allocation. ldm bind will report the following error:
# ldm bind guest01 Not enough free memory present to meet this request Could not bind requested memory for LDom guest01
Corrective actions :
1. Review how much unallocated memory is available on the system using the ldm list-devices memory subcommand. Then assign the actual free available memory to the domain.
# ldm list-devices memory MEMORY PA SIZE 0x188000000 1920M # ldm set-memory 1920M guest01 # ldm bind guest01
2. Alternatively, reduce the amount of memory from another domain by using the ldm set-memory or remove-memory subcommand on the another domain. Then proceed to bind the domain.
# ldm remove-memory 1g primary # ldm bind guest01
VCPU binding failure
VCPU binding failure is reported when there is no additional unallocated VCPU available for allocation. ldm bind will report the following error:
# ldm bind guest01 Insufficient VCPU resources to bind LDom guest01
The corrective actions are to:
1. Review how many unallocated VCPUs are available on the system using the ldm list-devices vcpu subcommand. Then assigned the actual free available VCPUs to the domain.
# ldm list-devices vcpu VCPU PID %FREE 28 100 29 100 30 100 31 100 # ldm set-vcpu 4 guest01 # ldm bind guest01
2. Alternatively, reduce the amount of VCPUs from another domain by using the ldm(1M) set-vcpu or remove-vcpu subcommand on the another domain. Then proceed to bind the domain.
# ldm remove-vcpu 2 primary # ldm bind guest01
MAU(Cryptographic Unit) binding failure
*NOTE* MAU units are on T1, T2 and T3 series systems only. T4 systems do not have MAU’s. On T4 systems, each core contains a stream processing unit (SPU) that provides cryptographic processing.
MAU binding failure is reported when there is no additional unallocated MAU available for allocation. Additionally, requirement is that at least one of the VCPUs from the processor core where the MAU originated has to be assigned to the domain. Despite the MAU binding failure, the bind will proceed with the rest of resources. ldm bind will report the following error:
# ldm bind guest01 Only 1 physical crypto unit resource(s) available to bind to LDom guest01, proceeding with binding 1 additional crypto unit(s)
The corrective actions are to:
1. Review how many unallocated MAUs are available on the system using the ldm list-devices mau subcommand. Then make sure at least one VCPUs from the associated core is assigned domain before assigning the available MAUs to the domain.
# ldm list-devices mau MAU ID CPUSET 2 (8, 9, 10, 11) 3 (12, 13, 14, 15) 4 (16, 17, 18, 19) 5 (20, 21, 22, 23) 6 (24, 25, 26, 27) 7 (28, 29, 30, 31) # ldm add-vcpu 1 guest01 # ldm add-mau 1 guest01 # ldm bind guest01
2. Alternatively, reduce the amount of MAUs from another domain by using the ldm set-mau or remove-mau subcommand on the another domain. Here, again one must take care to ensure one VCPU from the core where the MAU is originating from is assigned to the domain. Then proceed to bind the domain.
# ldm remove-mau 1 primary
One may also need to remove 1 VCPU and assign to the guest to make sure one of the VCPUs from the same core as the MAU is assigned to the guest.
# ldm remove-vcpu 1 primary # ldm add-vcpu 1 primary # ldm bind guest01
Virtual Console binding failure
Virtual console concentrator(VCC) and virtual console(VCONS) provide console access to the guest domain. It need to be bound a virtual console service created on the primary domain. Miscovered VCC can resulted in error during binding:
# ldm bind guest01 The virtual console concentrator service primary-vcc1 not exist or is not bound. Didn't find a suitable vcc service in a bound service domain to bind guest guest01 console
To ensure the VCC configurations are configured according, use the ldm list -l [domain] comand to review the configurations:
# ldm list -l guest01 NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME guest01 bound ------ 5001 4 3G --lines omitted-- VCONS NAME SERVICE PORT LOGGING guest01 primary-vcc0@primary 5001 on # ldm list -l primary NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME primary active -n-cv- SP 4 4128M 0.9% 14d 55m --lines omitted-- VCC NAME PORT-RANGE primary-vcc0 5000-5100 --lines omitted-- VCONS NAME SERVICE PORT LOGGING SP
If not configured, please follow the following steps to configure the console service:
# ldm add-vconscon port-range=5000-5100 primary-vcc0 primary # ldm set-vconsole port=5000 service=primary-vcc0@primary guest01
Then make sure that the LDom console SMF service is update and running:
# svcs vntsd STATE STIME FMRI offline 13:20:40 svc:/ldoms/vntsd:default # svcadm enable vntsd # svcs vntsd STATE STIME FMRI online 13:39:35 svc:/ldoms/vntsd:default
Data collection for troubleshooting
It is important to provide the proper troubleshooting data to oracle support in order to get a speedy resolution. Below are few log files which can be really helpful in troubleshooting the LDOM issues.
SMF(Service Management Facility) logs for the LDom related services on the primary domain
/var/svc/log/ldoms-ldmd:default.log /var/svc/log/ldoms-vntsd:default.log /var/svc/log/ldoms-agents:default.log /var/opt/SUNWldm/ldom-db.xml
SMF(Service Management Facility) and logs for the LDom related services on the Guest LDom
/var/svc/log/platform-sun4v-drd:default.log /var/svc/log/ldoms-agents:default.log
If the ldmd daemon dumps core, the core file will be ‘/var/opt/SUNWldm/core’ unless you have used coreadm to control the core file location. The message ‘Invalid response’ from ldm usually means that the ldmd daemon has dumped core so the core file should be collected in that case.
explorer
Explorer collects a number of important LDom related outputs on primary(control) domain :
# /opt/SUNWexplo/bin/explorer -w default,ldom