scontrol is a command-line tool in the Slurm workload manager that allows you to view information about and modify jobs running on a Slurm cluster. It provides a comprehensive set of options and commands for managing jobs, nodes, partitions, and other aspects of the Slurm system.
With scontrol, you can view detailed information about the state and configuration of jobs on the cluster. This includes information such as the job ID, the user who submitted the job, the state of the job (e.g., running, completed, failed), and the nodes and resources allocated to the job.
scontrol Command Examples
1. Show information for job:
# scontrol show job job_id
2. Suspend a comma-separated list of running jobs:
# scontrol suspend job_id
3. Resume a comma-separated list of suspended jobs:
# scontrol resume job_id
4. Hold a comma-separated list of queued jobs (Use `release` command to permit the jobs to be scheduled):
# scontrol hold job_id
5. Release a comma-separated list of suspended job:
# scontrol release job_id
In addition to job management, scontrol also provides commands for managing nodes and partitions on the cluster. For example, you can use scontrol to view the state of nodes on the cluster, such as their CPU and memory usage, and to set or modify the attributes of partitions, such as the maximum number of jobs that can run in a partition at once.
Overall, scontrol is a powerful and versatile tool for managing jobs and resources on a Slurm cluster. Its extensive set of options and commands makes it a valuable tool for Slurm users and administrators alike.