Submitting Simple Jobs from Condor to Globus Toolkit 4

This tutorial, which you should read following the Condor-GT4 Introduction, explains how to submit "Grid" jobs using Condor - more precisely, how to submit jobs to a Globus Toolkit 4 host acting as a front to one of the Grid sites. The target audience are end users, who understand their application software and have used Condor before to submit jobs to a local Condor pool. We assume that your administrator has already set up the Condor submission host according to this page.

Submitting a simple job to a Grid site is simple as one-two-three:

  1. Authenticate with grid-proxy-init
  2. Prepare a suitable command file (job.cmd)
  3. condor_submit job.cmd

However, if you wish to submit a group consisting of hundreds (or more) of such "simple jobs", it would be very advisable to package them into a larger multi-processor job. The second part of this tutorial explains how to do it effectively without losing the benefit of parallel execution.

Submitting your first Grid job

First, determine what your "Condor submission host" is. This is the machine on which you run the command condor_submit. It can be your own workstation or some other machine. Ask your administrator if in doubt.

Second, determine from which machine you are going to authenticate with your Grid user certificate. This is the machine on which you run the command grid-proxy-init. It can be either your own workstation or the Condor submission host. Again, ask your administrator if necessary.

Authenticate with grid-proxy-init

The first step only needs to be performed once per day (not once per job).

Run the grid-proxy-init command. It will ask you for the password protecting your Grid user certificate. If everything goes well, it will generate a temporary file called a "proxy certificate" in /tmp/x509up_u<your Unix user id>.

(Optional) Log in to the submission host

If you have run grid-proxy-init on a machine other than the Condor submission host, then use gsissh now to log into the submission host:

gsissh srvgrid01.offis.uni-oldenburg.de

This will automatically transport your proxy certificate to the submission host. If you are already logged in, this step is not necessary of course.

Prepare a suitable command file

In your job command file, instead of using the standard or vanilla universe as when submitting to a local pool, you should use the grid universe and specify to which Grid site the job should be submitted:

universe = grid
grid_resource = gt4 https://srvgrid01.offis.uni-oldenburg.de/wsrf/services/ManagedJobFactoryService PBS

srvgrid01.offis.uni-oldenburg.de is the Grid site name (more precisely, name of the host running Globus Toolkit 4). Here is a complete example of a simple job command file:

executable = /bin/bash
arguments = yourscript.sh
transfer_executable = false
transfer_input_files = yourscript.sh
when_to_transfer_output = ON_EXIT
universe = grid
grid_resource = gt4 https://srvgrid01.offis.uni-oldenburg.de/wsrf/services/ManagedJobFactoryService PBS
output = test.out
error = test.err
log = test.log
queue

You can use the normal Condor options in the command file (most notably, transfer_input_files and transfer_output_files). One exception is the requirements option. If you submit your job to a specific target Grid site, it is implicitly assumed that the machines ("worker nodes") from that site in fact fulfill your needs - or that you ship all necessary files with your job. All worker nodes within a Grid site typically have the same (or at least very similar) hardware configuration, so it is naturally less necessary to specify hardware requirements or preferences than in traditional Condor pools. Furthermore, each of your jobs is guaranteed to run from start to finish on the same assigned worker node. Note the grid universe is like the vanilla universe in that it does not support checkpointing or delegating system calls back to the submission host.

As a side note, it is technically possible to let Condor select the target Grid site. For now, we assume that the name of the target Grid site is provided by the user in the grid_resource line of the command file.

Submit your job

This step is no different than submitting any Condor job: condor_submit job.cmd where job.cmd is the job command file created in the previous step.

As usual, you can use condor_q to see your job

jploski@pcoffis26:~/condor-gt4> condor_q

-- Submitter: pcoffis26.offis.uni-oldenburg.de : <134.106.52.79:20320> : pcoffis26.offis.uni-oldenburg.de
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
 107.0   jploski        11/5  13:30   0+00:00:00 I  0   9.8  bash              
 108.0   jploski        11/5  13:30   0+00:00:00 R  0   0.0  gridftp_wrapper.sh

You can also use condor_q -globus to see further details

jploski@pcoffis26:~/condor-gt4> condor_q -globus

-- Submitter: pcoffis26.offis.uni-oldenburg.de : <134.106.52.79:20320> : pcoffis26.offis.uni-oldenburg.de
 ID      OWNER          STATUS  MANAGER  HOST                EXECUTABLE        
 107.0   jploski       STAGE_IN PBS      srvgrid01.offis.un  /bin/bash         

During the job's lifetime, the STATUS will change as follows:

  1. UNSUBMITTED - the job has not left the Condor submission host yet
  2. STAGE_IN - input files are being transferred from the Condor submission host to the Grid site)
  3. PENDING - the job has arrived at the Grid site completely, but is not running yet (for example, because the site is occupied by other jobs)
  4. ACTIVE - the job is being executed at the Grid site
  5. STAGE_OUT - output files are being transferred from the Grid site back to the Condor submission host
  6. DONE

You can also look at the log file produced by the job, which also tracks the above information and may contain additional information in case of an error.

Submitting "big" jobs

Now that you are comfortable submitting small and useless Grid jobs, you can advance to the big and useful Grid jobs.