Hp XC System 3.x Software Instrukcja Użytkownika Pobierz pdf (Strona 87)

Pseudo-parallel job A job that requests only one slot but specifies any of these constraints:

• mem

• tmp

• nodes=1

• mincpus > 1

Pseudo-parallel jobs are allocated one node for their exclusive use.

NOTE: Do NOT rely on this feature to provide node-level allocation

for small jobs in job scripts. Use the SLURM[nodes] specification instead,

along with mem, tmp, mincpus allocation options.

LSF-HPC considers this job type as a parallel job because the job requests

explicit node resources. LSF-HPC does not monitor these additional

resources, so it cannot schedule any other jobs to the node without risking

resource contention. Therefore LSF-HPC allocates the appropriate whole

node for exclusive use by the serial job in the same manner as it does for

parallel jobs, hence the name “pseudo-parallel”.

Parallel job A job that requests more than one slot, regardless of any other constraints.

Parallel jobs are allocated up to the maximum number of nodes specified

by the following specifications:

• SLURM[nodes=min-max] (if specified)

• SLURM[nodelist=node_list] (if specified)

• bsub -n

Parallel jobs and serial jobs cannot run on the same node.

Small job A parallel job that can potentially fit into a single node, and does not

explicitly request more than one node (SLURM[nodes] or

SLURM[node_list] specification). LSF-HPC tries to allocate a single

node for a small job.

10.5 Using LSF-HPC Integrated with SLURM in the HP XC Environment

This section provides some additional information that should be noted about using LSF-HPC in the HP

XC Environment.

10.5.1 Useful Commands

The following describe useful commands for LSF-HPC Integrated with SLURM:

• Use the bjobs -l and bhist -l commands to see the components of the actual SLURM allocation

command.

• Use the bkill command to kill jobs.

• Use the bjobs command to monitor job status in LSF-HPC integrated with SLURM.

• Use the bqueues command to list the configured job queues in LSF-HPC integrated with SLURM.

10.5.2 Job Startup and Job Control

When LSF-HPC starts a SLURM job, it sets SLURM_JOBID to associate the job with the SLURM allocation.

While a job is running, all LSF-HPC supported operating-system-enforced resource limits are supported,

including core limit, CPU time limit, data limit, file size limit, memory limit, and stack limit. If the user

kills a job, LSF-HPC propagates signals to entire job, including the job file running on the local node and

all tasks running on remote nodes.

10.5.3 Preemption

LSF-HPC uses the SLURM "node share" feature to facilitate preemption. When a low-priority is job

preempted, job processes are suspended on allocated nodes, and LSF-HPC places the high-priority job on

the same node. After the high-priority job completes, LSF-HPC resumes suspended low-priority jobs.

10.5 Using LSF-HPC Integrated with SLURM in the HP XC Environment 87

1 2 ... 82 83 84 85 86 87 88 89 90 91 92 ... 132 133

Komentarze do niniejszej Instrukcji

Brak uwag

Hp XC System 3.x Software Instrukcja Użytkownika Strona 87

Komentarze do niniejszej Instrukcji

Powiązane produkty i podręczniki dla Oprogramowanie Hp XC System 3.x Software