Starting with Condor
Starting a Job with Condor
On a cluster, a job is not interactive. You need to packed your job into a script (shell script per example) to have a “one-command-non-interactive” script.
Per example, if you want to go to a specific directory and list all file in it, you have to packed those 2 commands on one script (myscript.sh) like
#!/bin/bash
hostname # print the current worker node
date # print the current time
sleep 60 # wait 1 min
date # print the current time
When this step is finished, you can ask the cluster to start your script.
Also as a job in cluster is not interactive, you will need to create a job description file where all your screen output will be store. For the batch system we use, this description file (myscript.sub) looks like
executable = myscript.sh
log = myscript.log.$(ClusterID)-$(Process)
output = myscript.out.$(ClusterID)-$(Process)
error = myscript.err.$(ClusterID)-$(Process)
queue
You can now start it with the command
condor_submit myscript.sub
When job is done, you will have 3 files called myscript.log.xx-0, myscript.out.xx-0 and myscript.err.xx-0 which contains the differents output of your job.
Note: With HTCondor you can start several “tasks” in one command. You just need to add the number of task you want to start after “queue” key work.
Note: $(ClusterID)-$(Process) will prevent the overwrite of the output in case of multiple submission. ClusterID will be the ID for your current job, ProcessID is the id of each task
Follow a job status
condor_q command will display information current submitted job. The output of the command is
[username@t3ps test]$ condor_q
-- Schedd: t3ps.najah.edu : <172.16.1.17:9618?... @ 11/18/19 10:49:28
OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS
username ID: 3 11/18 10:49 _ 1 _ 1 3.0
Total for query: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
Total for username: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
Total for all users: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
With following important information
- OWNER:
your unix account
- BATCH_NAME:
the internal name of your job
- SUBMITTED:
the date when job has been submitted
- DONE:
The number of task finished
- RUN:
The number of task currently run
- IDLE:
The number of task still waiting for running