Starting with Condor

Starting a Job with Condor

On a cluster, a job is not interactive. You need to packed your job into a script (shell script per example) to have a “one-command-non-interactive” script.

Per example, if you want to go to a specific directory and list all file in it, you have to packed those 2 commands on one script (myscript.sh) like

#!/bin/bash
hostname # print the current worker node
date # print the current time
sleep 60 # wait 1 min
date # print the current time

When this step is finished, you can ask the cluster to start your script.

Also as a job in cluster is not interactive, you will need to create a job description file where all your screen output will be store. For the batch system we use, this description file (myscript.sub) looks like

executable = myscript.sh
log        = myscript.log.$(ClusterID)-$(Process)
output     = myscript.out.$(ClusterID)-$(Process)
error      = myscript.err.$(ClusterID)-$(Process)
queue

You can now start it with the command

condor_submit myscript.sub

When job is done, you will have 3 files called myscript.log.xx-0, myscript.out.xx-0 and myscript.err.xx-0 which contains the differents output of your job.

Note: With HTCondor you can start several “tasks” in one command. You just need to add the number of task you want to start after “queue” key work.

Note: $(ClusterID)-$(Process) will prevent the overwrite of the output in case of multiple submission. ClusterID will be the ID for your current job, ProcessID is the id of each task

Follow a job status

condor_q command will display information current submitted job. The output of the command is

[username@t3ps test]$ condor_q

-- Schedd: t3ps.najah.edu : <172.16.1.17:9618?... @ 11/18/19 10:49:28
OWNER    BATCH_NAME    SUBMITTED   DONE   RUN    IDLE  TOTAL JOB_IDS
username ID: 3       11/18 10:49      _      1      _      1 3.0

Total for query: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
Total for username: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
Total for all users: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended

With following important information

OWNER:

your unix account

BATCH_NAME:

the internal name of your job

SUBMITTED:

the date when job has been submitted

DONE:

The number of task finished

RUN:

The number of task currently run

IDLE:

The number of task still waiting for running