srun: command not found

srun is a command-line tool used to create an interactive job or connect to an existing job on a SLURM-managed HPC cluster. An interactive job is a job that allows the user to directly access the compute nodes, either for debugging or for running interactive programs that require user input.

When used to create an interactive job, srun launches a shell on a compute node, allowing the user to execute commands interactively. This is useful for tasks such as debugging, testing, or running interactive programs that require user input. srun automatically allocates the necessary resources (such as CPU cores, memory, and GPU resources) required for the job and launches the job on the specified partition.

When used to connect to an existing job, srun allows the user to connect to a running job and run commands on the allocated resources. This can be useful for debugging, monitoring job progress, or making modifications to a running job.

srun provides a number of options for customizing the resources allocated for the job, including the number of CPU cores, memory, and GPU resources required. Additionally, srun supports parallel job execution through MPI and OpenMP, allowing users to run parallel applications on the cluster.

If you encounter the below error while running the command srun:

srun: command not found

you may try installing the below package as per your choice of distribution:

Distribution Command
Debian apt-get install slurm-client
Ubuntu apt-get install slurm-client
Kali Linux apt-get install slurm-client
Fedora dnf install slurm
OS X brew install slurm
Raspbian apt-get install slurm-client

srun Command Examples

1. Submit a basic interactive job:

# srun --pty /bin/bash

2. Submit an interactive job with different attributes:

# srun --ntasks-per-node=num_cores --mem-per-cpu=memory_MB --pty /bin/bash

3. Connect to a worker node with a job running:

# srun --jobid=job_id --pty /bin/bash

Summary

In summary, srun is a powerful tool for interacting with the SLURM scheduler on HPC clusters. It provides a way to create interactive jobs and connect to running jobs, allowing users to easily debug, monitor, and modify jobs running on the cluster.

Related Post