Submitting Complex Jobs from Condor to Globus Toolkit 4

This tutorial, which you should read following the Condor-GT4 Introduction and the first part about submitting simple jobs, explains how to submit big Grid jobs using Condor. "Big" is used here to mean "consisting of many smaller subjobs" rather than "long running".

It turns out that the Condor-GT4 combination is not very efficient at handling large amounts of small jobs due to communication overheads. For example, in an experiment which consisted of submitting 500 small single-processor jobs, the total processing time (from submit of the first job to the finish of the last job) was 2 hours, with some errors encountered along the way. Simply repackaging the jobs into a few multi-processor jobs shortened the processing time to 1 hour and eliminated the errors. This page contains a recipe for such repackaging.

A multi-processor job

Suppose that you have 500 independent jobs like in the above example. Rather than submit each of these jobs separately, we would like to submit just 32 jobs, each of which should be assigned 16 processors and internally execute 16 of the original "subjobs" in parallel. This will substantially reduce the number of file transfers (during the stage-in and stage-out phases) from (at least) 500 to (at least) 32.

Here is an example command file which submits a multiprocessor job:

globus_xml            = <count>16</count><jobType>multiple</jobType><queue>dgipar</queue>
executable            = /bin/bash
arguments             = test.sh
error                 = test.err
output                = test.out
log                   = test.log
universe              = grid
grid_resource         = gt4 https://srvgrid01.offis.uni-oldenburg.de/wsrf/services/ManagedJobFactoryService PBS
whentotransferoutput  = ON_EXIT
notification          = Error
should_transfer_files = ALWAYS
transfer_executable   = false
transfer_input_files  = test.sh
queue

What happens when you submit such a file is that

  • 16 processors in total are allocated - due to <count>16</count>
  • on each of these processors the same executable with the same arguments is started concurrently - due to <jobType>multiple</jobType>; the job finishes when all of the executables have terminated
  • the job is routed at the Grid site through the dgipar queue (which is a requirement in D-Grid - you might need to adjust it accordingly)

Of course, this is not exactly what you need:

  • each processor should run a job with different arguments, corresponding to the part of parameter space to explore or part of the input to process
  • it would be nice (or even essential) to be able to initialize the working environment before the first subjob is started (for example, you might need to create some required directory structure) and to clean it up afterwards (for example, you may wish to put results into a neat .tar.gz archive that can be retrieved with the transfer_output_files option)

We have developed a little Perl module called MultiJob.pm which, submitted together with your job, gives you this flexibility.

A multi-processor job with MultiJob.pm

Download the complete example with all required files.

The job command file in the example looks as follows:

globus_xml            = <count>16</count><jobType>multiple</jobType><queue>dgipar</queue>
executable            = /bin/bash
arguments             = multijob-runner.sh 16
error                 = test.err
output                = test.out
log                   = test.log
universe              = grid
grid_resource         = gt4 https://srvgrid01.offis.uni-oldenburg.de/wsrf/services/ManagedJobFactoryService PBS
whentotransferoutput  = ON_EXIT
notification          = Error
should_transfer_files = ALWAYS
transfer_executable   = false
transfer_input_files  = multijob-runner.sh,test.pl,MultiJob.pm,test.sh
queue

There are some new files involved here:

multijob-runner.sh is a tiny script which contains just

. /etc/profile
. ~/.profile
perl test.pl $*

It sets up environment variables (sadly, this is not always done correctly by Globus). Then it forwards all the command-line arguments to test.pl.

test.pl is a custom Perl script which contains the code that should be run

  • once, for the initialization
  • then, concurrently on each allocated processor
  • finally, once, for the cleanup

For example:

use MultiJob;

my $np = $ARGV[0];

MultiJob::run($np, \&pre, \&main, \&post);

sub pre
{
    # This code is run only by the 'master' process, before everyone runs main
    # ...
    print "pre\n";
}

sub main
{
    my $master = shift;  # 1 for the 'master' process, 0 for all others
    my $proc = shift;    # process number 0..n-1

    # This code is run by each process
    # ...
    system("/bin/bash test.sh $proc");
}

sub post
{
    # This code is run only by the 'master' process, after everyone exits main
    # ...
    print "post\n";
}

The total number of processes is received by test.pl as a command-line argument ($np). Note that the 16 is repeated twice in the Condor command file. test.pl invokes the MultiJob module, passing it $np and references to the three procedures: pre, main and post.

The original test.sh script is invoked from the main subroutine and receives the process number as its command-line argument. Because this argument will be different for each process, it can be used to select the part of input or alter the process's behavior in any way you like. Obviously, instead of delegating the processing work to test.sh, you could also implement it entirely in Perl, which may or may not be a good idea depending on your programming skill and the ability to reuse pre-existing shell scripts.

The job is submitted with condor_submit, as usual. The example job from the archive mentioned at the beginning of this section should produce output similar to this one:

pre
Running process number 0, pid=14444 on node2
Running process number 1, pid=14447 on node2
Running process number 2, pid=14450 on node2
Running process number 3, pid=14453 on node2
Running process number 4, pid=14456 on node2
Running process number 5, pid=14459 on node2
Finished process number 0, pid=14444 on node2
Running process number 6, pid=14462 on node2
Running process number 7, pid=14465 on node2
Finished process number 1, pid=14447 on node2
Running process number 8, pid=20386 on node1
Finished process number 2, pid=14450 on node2
Finished process number 3, pid=14453 on node2
Running process number 9, pid=20389 on node1
Finished process number 4, pid=14456 on node2
Finished process number 5, pid=14459 on node2
Running process number 10, pid=20392 on node1
Finished process number 6, pid=14462 on node2
Finished process number 7, pid=14465 on node2
Running process number 11, pid=20395 on node1
Finished process number 8, pid=20386 on node1
Running process number 12, pid=20398 on node1
Running process number 13, pid=20401 on node1
Finished process number 9, pid=20389 on node1
Running process number 14, pid=20404 on node1
Running process number 15, pid=20407 on node1
Finished process number 10, pid=20392 on node1
Finished process number 11, pid=20395 on node1
Finished process number 12, pid=20398 on node1
Finished process number 13, pid=20401 on node1
Finished process number 14, pid=20404 on node1
Finished process number 15, pid=20407 on node1
post

In a real application, you would likely generate and submit the Condor command file from another script which you invoke with arguments describing the input(s) to be processed by the job.

Next steps in conquering the Grid

Now that you know how to package your jobs for an efficient execution at an explicitly specified GT4 site, how about automating the site selection process so that you can transparently utilize multiple sites? Although it may require some help from your administrator, it is not difficult if you follow the examples given in Condor-GT4-Metascheduling.