Overview of the Cluster Logs
MySQL Cluster can be complicated in nature and this requires a more sophisticated method of logging to manage all the information. The logging can be performed at a global level indicating the overall health of the cluster and how the nodes are interacting. There is also a local logging level which gives the status of the specific node and any internal errors that may, or may not, be affecting other nodes.
The global logging of the cluster is done via log events which indicate timestamp, the severity level and a message that all get entered into the log file. The type of events vary from statistic counters through to connection level and error types. The log events give a good indication of the overall health of the cluster and what processes have occurred leading up to an issue if one occurs.
Management Logs
The management node(s) of the MySQL Cluster write to the cluster log identified as ndb_node_id_cluster.log in the data directory by default. Typically the node_id value here would be 1 or 2, but this may vary depending on your choice of node id’s. The cluster log is useful as it provides information for an entire cluster in the one file, although this can make it a bit tricky to read as well. Here is an example output of the cluster log:
.... 2011-09-01 13:33:30 [MgmtSrvr] INFO -- Node 4: Node 26: API mysql-5.1.51 ndb-7.1.9 2011-09-01 13:33:30 [MgmtSrvr] INFO -- Node 4: Node 18: API mysql-5.1.51 ndb-7.1.9 2011-09-01 13:33:31 [MgmtSrvr] INFO -- Node 3: Local checkpoint 98 started. Keep GCI = 258791 oldest restorable GCI = 258792 2011-09-01 13:33:32 [MgmtSrvr] INFO -- Node 4: Start phase 101 completed (node restart) 2011-09-01 13:33:32 [MgmtSrvr] INFO -- Node 3: DICT: unlocked by node 4 for NodeRestart 2011-09-01 13:33:32 [MgmtSrvr] INFO -- Node 4: Started (mysql-5.1.51 ndb-7.1.9) 2011-09-01 13:40:53 [MgmtSrvr] INFO -- Node 5: Data usage is 82(294191 32K pages of total 356352) 2011-09-01 13:40:53 [MgmtSrvr] INFO -- Node 5: Index usage is 84(176477 8K pages of total 209024) 2011-09-01 13:40:55 [MgmtSrvr] INFO -- Node 3: Data usage is 82(294216 32K pages of total 356352) 2011-09-01 13:40:55 [MgmtSrvr] INFO -- Node 3: Index usage is 84(176527 8K pages of total 209024) 2011-09-01 13:42:48 [MgmtSrvr] INFO -- Node 6: Data usage is 82(294505 32K pages of total 356352) 2011-09-01 13:42:48 [MgmtSrvr] INFO -- Node 6: Index usage is 84(176502 8K pages of total 209024) 2011-09-01 13:45:31 [MgmtSrvr] INFO -- Node 4: Data usage is 82(294534 32K pages of total 356352) 2011-09-01 13:45:31 [MgmtSrvr] INFO -- Node 4: Index usage is 84(176506 8K pages of total 209024) 2011-09-01 13:53:24 [MgmtSrvr] INFO -- Node 3: Local checkpoint 98 completed 2011-09-01 13:53:24 [MgmtSrvr] INFO -- Node 6: Suma: asking node 3 to recreate subscriptions on me 2011-09-01 13:53:24 [MgmtSrvr] INFO -- Node 6: Start phase 5 completed (node restart) 2011-09-01 13:53:24 [MgmtSrvr] INFO -- Node 6: Start phase 6 completed (node restart) 2011-09-01 13:53:24 [MgmtSrvr] INFO -- Node 6: Start phase 7 completed (node restart) ....
The information related to cluster logging can also be sent to stdout or using the syslog facility instead of being written to a file. These can be adjusted using the parameter LogDestination which specifies where to send the logging information. If you include the value “FILE:” it defines a filename for logging and also allows the maximum file size and maximum number of files. For example:
LogDestination="FILE:filename=cluster.log,maxsize=1000000,maxfiles=6"
The maxsize string sets the maximum file size in bytes before the log will be rolled over to a new file. When this occurs, the old log is renamed by appending .N to the filename where N is the next incremented value not in use. If there are more than maxfiles existing when the rollover occurs the oldest file will be removed.
Cluster Node Logs
The node log, ndb_node_id_out.log, is created on each of the MySQL Cluster data nodes to show information specific to the local node. The node_id identifies the local node where the log originated. This node log can grow very big and will need management to rotate or clear the old logs.
An example log:
.... 2011-03-23 17:24:17 [ndbd] INFO -- Angel pid: 262 started child: 263 2011-03-23 17:24:17 [ndbd] INFO -- Configuration fetched from 'ndb-mgmd-node1:1186', generation: 1 NDBMT: MaxNoOfExecutionThreads=8 NDBMT: workers=4 threads=4 2011-03-23 17:24:17 [ndbd] INFO -- NDB Cluster -- DB node 3 2011-03-23 17:24:17 [ndbd] INFO -- mysql-5.1.51 ndb-7.1.10 -- 2011-03-23 17:24:17 [ndbd] INFO -- WatchDog timer is set to 6000 ms 2011-03-23 17:24:17 [ndbd] INFO -- numa_set_interleave_mask(numa_all_nodes) : no numa support 2011-03-23 17:24:17 [ndbd] INFO -- Ndbd_mem_manager::init(1) min: 13951Mb initial: 14335Mb Adding 380Mb to ZONE_LO (1,12159) NDBMT: num_threads=7 thr: 0 tid: 5 DBTC(0) DBDIH(0) DBDICT(0) NDBCNTR(0) QMGR(0) NDBFS(0) TRIX(0) DBUTIL(0) thr: 1 tid: 6 BACKUP(0) DBLQH(0) DBACC(0) DBTUP(0) SUMA(0) DBTUX(0) TSMAN(0) LGMAN(0) PGMAN(0) RESTORE(0) DBINFO(0) PGMAN(5) thr: 2 tid: 7 PGMAN(1) DBACC(1) DBLQH(1) DBTUP(1) BACKUP(1) DBTUX(1) RESTORE(1) thr: 3 tid: 8 PGMAN(2) DBACC(2) DBLQH(2) DBTUP(2) BACKUP(2) DBTUX(2) RESTORE(2) thr: 4 tid: 9 PGMAN(3) DBACC(3) DBLQH(3) DBTUP(3) BACKUP(3) DBTUX(3) RESTORE(3) thr: 6 tid: 1 CMVMI(0) thr: 5 tid: 10 PGMAN(4) DBACC(4) DBLQH(4) DBTUP(4) BACKUP(4) DBTUX(4) RESTORE(4) saving 103ae0000 at 100581d20 (0) 2011-03-23 17:24:20 [ndbd] INFO -- Start initiated (mysql-5.1.51 ndb-7.1.10) NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer 2011-03-23 17:24:42 [ndbd] INFO -- timerHandlingLab now: 5031230115 sent: 5031229933 diff: 182 2011-03-23 17:24:42 [ndbd] INFO -- Watchdog: User time: 189 System time: 2412 2011-03-23 17:24:42 [ndbd] WARNING -- Watchdog: Warning overslept 244 ms, expected 100 ms. Adding 7812Mb to ZONE_LO (12160,249983) Adding 6143Mb to ZONE_LO (262145,196575) ....
Rotating the log file of a data node involves copying the current one and zeroing the original. If you do not want the log file at all, you can simply skip copying the file and empty it.
shell> cp ndb_3_out.log ndb_3_out.log.1 shell> cat /dev/null > ndb_3_out.log
Another form of node log is the node error log which is ndb_node_id_error.log by name. This is a summary log format that lists the major errors that have occurred on that node in the MySQL Cluster. This can often be a quick source of information to determine the health of a node in the Cluster. Details in the log include the timestamp, error message, code block, and associated trace files. An example format would be:
.... Time: Wednesday 24 August 2011 - 15:28:49 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug) Error: 2341 Error data: dbdih/DbdihMain.cpp Error object: DBDIH (Line: 9138) 0x00000002 Program: /opt/mysql/mysql/bin/ndbmtd Pid: 21999 thr: 0 Version: mysql-5.1.56 ndb-7.1.15 Trace: /home/mysql/data/logs/ndb_3_trace.log.10 /home/mysql/data/logs/ndb_3_trace.log.10_t1 /home/mysql/data ....
Trace File Logs
The trace files are located on the local node where an error has occurred that required a trace file to be generated. The trace files give an indication of the path taken by the node through the code and where it finished at the time of the error. The log is essentially broken up into to sections with the first part showing the path through the code and the second section showing the signals/events that occurred along that pathway. Here is an example snippet of a trace file:
.... NDBFS 001461 NDBFS 001433 001294 001452 DBTC 004649 NDBFS 001461 001294 DBTC 004649 NDBFS 001461 001294 QMGR 000208 003661 003665 --------------- Signal ---------------- r.bn: 252 "QMGR", r.proc: 3, r.sigId: 618218 gsn: 254 "FAIL_REP" prio: 0 s.bn: 252 "QMGR", s.proc: 6, s.sigId: 1208832226 length: 3 trace: 8 #sec: 0 fragInf: 0 FailedNode: 5, FailCause: 5 --------------- Signal ---------------- r.bn: 253 "NDBFS", r.proc: 3, r.sigId: 618217 gsn: 164 "CONTINUEB" prio: 1 s.bn: 253 "NDBFS", s.proc: 3, s.sigId: 618215 length: 1 trace: 0 #sec: 0 fragInf: 0 Scanning the memory channel again with no delay --------------- Signal ---------------- r.bn: 253 "NDBFS", r.proc: 3, r.sigId: 618216 gsn: 264 "FSREADREQ" prio: 1 s.bn: 262/1 "RESTORE", s.proc: 3, s.sigId: 24744752 length: 7 trace: 2 #sec: 0 fragInf: 0 UserPointer: 0 FilePointer: 1484 UserReference: H'03060003 Operation flag: H'00000023 (No sync, Format=List of global pages) List of shared pages) varIndex: 765 numberOfPages: 1 pageData: H'000440bf, --------------- Signal ---------------- r.bn: 253 "NDBFS", r.proc: 3, r.sigId: 618215 gsn: 164 "CONTINUEB" prio: 0 s.bn: 253 "NDBFS", s.proc: 3, s.sigId: 618209 length: 1 trace: 0 #sec: 0 fragInf: 0 Scanning the memory channel every 10ms ....