Motivation
During the investigation of DPU offload, Slurm is used as workload management system to integrate
with offload daemons. The document give a short description on how to configure Slum prolog/epilog for
offload daemons in the future.
Prolog Script
Create prolog/prologSlurmCtld script for slurm; in those scripts, only print the environment values that
the script can get for coming operations.
cat << EOF | sudo tee /opt/hpc-cloud/prolog-ctld.sh
#!/bin/bash
echo "$@" > /opt/hpc-cloud/prolog-ctld.log
env >> /opt/hpc-cloud/prolog-ctld.log
EOF
$ cat << EOF | sudo tee /opt/hpc-cloud/prolog.sh
#!/bin/bash
echo "$@" > /opt/hpc-cloud/prolog.log
env >> /opt/hpc-cloud/prolog.log
EOF
Update slurm.conf
After creating the scripts, update slurm.conf
on both control plane and compute node to
enable prolog/prologSlurmctld as following:
Prolog=/opt/hpc-cloud/prolog.sh
PrologSlurmctld=/opt/hpc-cloud/prolog-ctld.sh
Restart control plane
$ sudo systemctl restart slurmctld
Restart compute node
$ sudo systemctl restart slurmd
Verification
Submit a script job to the slurm and verify the output of prolog/prologSlurmCtld. According to the output of prolog/prolgSlurmCtld,
only pre-defined environment values are available in the scripts; the other environment values, e.g. DCM_CHARTS=ucc
, are not available
in the scripts.
$ cat << EOF | tee sleep.sh
#!/bin/bash
#SBATCH -o job.%j.out
#SBATCH -p dev
#SBATCH --qos=low
#SBATCH -J hpc-test
#SBATCH -c 2
#SBATCH -n 5
#SBATCH --export=DCM_CHARTS=ucc
/usr/bin/sleep 10
env
EOF
$ sbatch sleep.sh
$ cat /opt/slurm/prolog.log
SLURM_JOB_USER=klausm
SLURM_JOB_UID=47906
SLURMD_NODENAME=hpc-cloud01
SLURM_CLUSTER_NAME=openbce
PWD=/var/log
SLURM_JOB_PARTITION=dev
SLURM_JOBID=16
SLURM_JOB_CONSTRAINTS=(null)
SLURM_SCRIPT_CONTEXT=prolog_slurmd
SLURM_NODELIST=hpc-cloud01
SLURM_STEP_ID=4294967294
SHLVL=1
SLURM_UID=47906
SLURM_JOB_ID=16
SLURM_CONF=/etc/slurm-llnl/slurm.conf
_=/usr/bin/env
$ cat /opt/hpc-cloud/prolog-ctld.log
SLURM_JOB_USER=klausm
SLURM_JOB_UID=47906
SLURM_CLUSTER_NAME=openbce
PWD=/var/log
SLURM_JOB_PARTITION=dev
SLURM_JOBID=16
SLURM_SCRIPT_CONTEXT=prolog_slurmctld
SLURM_JOB_ACCOUNT=(null)
SHLVL=1
SLURM_JOB_ID=16
SLURM_JOB_NAME=hpc-test
SLURM_JOB_GROUP=dip
SLURM_JOB_GID=30
SLURM_JOB_NODELIST=hpc-cloud01
_=/usr/bin/env
References
comments powered by