Metascheduling with Condor-G

The first part of this tutorial explained how to submit Condor jobs to an explicitly specified Globus Toolkit 4 site. However, in Grid context, it is usual to have multiple target sites to choose from. This part shows how to implement so called metascheduling, which means selecting the target Grid site automatically based on some technical or user-specified criteria.

Match-making with substitution macros

It would be possible to implement metascheduling outside of Condor - based on some external information, simply generate the Condor command file which contains a grid_resource line that satisfies the user's requirements. However, there is another approach, which integrates with Condor more smoothly. The idea is to take advantage of the match-making mechanism. The mechanism is normally used by Condor to find machines from a local pool that satisfy job requirements (and vice versa: to find jobs that satisfy each machine's requirements). The only thing that changes with Condor-G is that instead of matching against the advertisements (ClassAds) of individual machines on a LAN, we want the matching to occur against the advertised GT4 head nodes. Minimal adjustments are needed in the job command file. Rather than a hard-coded grid_resource line, we use a line which contains placeholders. These placeholders, also known as substitution macros, are replaced with the actual values taken from a matching GT4 head node ClassAd. As discussed later, the matching itself occurs based on the specified universe grid, but may also involve additional job and machine attributes.

Here is a simple example of a job file with placeholders:

executable = /bin/bash
arguments = yourscript.sh
transfer_executable = false
transfer_input_files = yourscript.sh
when_to_transfer_output = ON_EXIT
universe = grid
grid_resource = gt4 $$(gatekeeper_url) $$(job_manager_type)
output = test.out
error = test.err
log = test.log
queue

The names of the placeholders can be chosen freely. The only requirement is that they match the names of attributes from the machine ClassAd which describes a matching GT4 head node, discussed next. Note that the placeholders can also appear in other (but not all) lines of the job command file, not just in the grid_resource line. This is useful if you wish to extract other information, such as paths or the name of the target Globus queue, from the machine ClassAd.

While it is imaginable to use a single placeholder for the entire grid_resource line, unfortunately it does not work well with GT4. When the 'gt4' is not literally present in the grid_resource line, condor_submit does not automatically insert a path to the proxy certificate into the job ClassAd, which leads to problems with job execution.

Advertising GT4 head nodes

In case of a local Condor pool, the condor_startd daemon running on each individual machine (als called "worker nodes", or "execution hosts") regularly advertises the machine to the condor_collector daemon running on the master machine. This is how the pool is formed.

In Grid context, we are not normally authorized to run condor_startd (or in fact any other daemons) on the GT4 head nodes, located in other research institutions. Furthermore, we lack the power to convince remote administrators to install (and maintain) such software at their sites. Fortunately, it is not necessary. The machine advertisements can be posted from any machine that has write access to the condor_collector daemon.

Incidentally, this is the same level of access also required for submitting jobs (they are ClassAds, too), meaning the we can advertise the GT4 head nodes from any host where condor_submit is allowed. While this fact makes setting up metascheduling easy, it also has serious security implications (not related to Condor-G). Any person allowed to submit jobs may also publish rogue machine advertisements and thus "hijack" other users' jobs to run on her machines. The current development version of Condor (6.9.5) allows fine-grained security options to prevent this sort of abuse, with the stable version, the administrators only countermeasure seems to be monitoring the pool for "alien" machines and/or trusting users authorized for condor_submit.

A machine ClassAd (which, when posted, can be seen with condor_status -l) is just a set of attributes and corresponding values. To prepare a ClassAd for a GT4 head node, you can create a simple text file with the following content (replace the machine name srvgrid01.offis.uni-oldenburg.de):

MyType                   = "Machine"
TargetType               = "Job"
Name                     = "srvgrid01.offis.uni-oldenburg.de"
Machine                  = "srvgrid01.offis.uni-oldenburg.de"
gatekeeper_url           = "https://srvgrid01.offis.uni-oldenburg.de/wsrf/services/ManagedJobFactoryService"
job_manager_type         = "PBS"
Requirements             = (TARGET.JobUniverse == 9)
Rank                     = 0.000000
CurrentRank              = 0.000000
WantAdRevaluate          = True
OpSys                    = "LINUX"
Arch                     = "X86_64"
ClassAdLifetime          = 60
State                    = "Owner"
Activity                 = "Idle"
UpdateSequenceNumber     = 1
wisent_GlobusQueue       = "test"
wisent_DefaultWRFVariant = "19"

Most of the attributes are self-explanatory. The State and Activity attributes are only there to keep condor_status from complaining about missing standard attributes. The ClassAdLifetime is the time, in seconds, after which the ClassAd is supposed to expire from condor_collector, unless it is sooner replaced by a newer version. UpdateSequenceNumber should be increased on every update (not necessarily by 1 - you can use date +%s just as well). The last two attributes, prefixed with wisent_, are probably only useful in our research project: the name of the target Globus queue (CE) for submitting jobs, and the number of the WRF software variant installed at that particular Grid site.

To post the ClassAd manually, run:

condor_advertise UPDATE_STARTD_AD /path/to/classad_file.txt

In a production setup, the above command would have to be executed at regular intervals, and the content of the posted ClassAd file might have to be updated to reflect the Grid site's state.

Immediately after that, you should see a new machine appearing in condor_status:

Name          OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime
srvgrid01.off LINUX       X86_64 Owner      Idle       [???]  [??]   [Unknown]

The "load average" and "memory" are not displayed because the corresponding attributes are missing from the machine ClassAd. These attributes do not make much sense for a GT4 head node managing an entire cluster. Here we can see a little "semantic mismatch" between the Condor command-line tools and the Grid: Condor traditionally expects machines to be real computers capable of accomodating a single job at a time, with an "activity" life cycle reflecting the job handling state. These assumptions simply do not hold for Grid sites consisting of multiple machines. However, the slightly confusing output should not worry us much. Submission of multiple jobs to the target Grid site is possible, as is keeping track of various Grid site attributes. For serious applications, the standard command-line end-users tools can be replaced by more sophisticated versions that understand and make good use of the custom attributes.

If the entry does not appear in condor_status after you run condor_advertise, then the likely cause is that you are not being authenticated properly or are not authorized to run condor_advertise on the particular host. Check the CollectorLog on the Condor pool master machine for a message like "PERMISSION DENIED to unknown user from host ...". Security options such as the ones below should remedy the problem. Consult the Condor Manual to understand the meaning of these options:

SEC_DEFAULT_NEGOTIATION = REQUIRED
SEC_DEFAULT_AUTHENTICATION = REQUIRED
SEC_DEFAULT_ENCRYPTION = OPTIONAL
SEC_DEFAULT_INTEGRITY = OPTIONAL
SEC_DEFAULT_AUTHENTICATION_METHODS = FS
SEC_DEFAULT_CRYPTO_METHODS = 3DES, BLOWFISH

Submitting jobs

There is nothing special about submitting your job to a GT4 machine advertised by a ClassAd. Use the placeholders as shown above and condor_submit your command file normally. Note that you can also add requirements or rank lines to restrict the machine selection or express preferences - just as you would within a local Condor pool. After the match, condor_q -l <job id> will show you the matched GT4 machine and the job will be forwarded to it just as if you specified the grid_resource yourself.

Integration with Grid information systems

The match-making mechanism of Condor is not aware of Grid information systems such as Globus MDS. It is entirely up to you which attributes you insert into the machine ClassAds, and where that information comes from. A simple command-line tool for reading the contents of Globus MDS is wsrf-query. Unfortunately, based on our experience, the contents of Globus MDS are not very useful in making scheduling decisions. While it is quite easy to figure out the number of free nodes using Globus MDS, it is difficult to estimate when and whether the submitted job will start running in a situation where all CPUs are already occupied. Furthermore, it is impossible to determine from MDS how many of your own jobs are already queued at a particular Grid site (you may be able find out with condor_q, or by monitoring job logs, however). Usually, you have little control over the content and quality of the MDS information coming from a remote Grid site, in contrast to the content and quality of information obtained by your own tests or from your own configuration database.

An obvious difficulty in implementing machine ClassAds for describing dynamic properties of Grid resources lies in ensuring that the information published in the ClassAds is (at least approximately) up-to-date. This requires observing the state changes of the remote resource. Ideally, such changes should be delivered to the ClassAd publisher as soon as they occur, in order to become incorprated into ClassAds. In reality, Globus MDS does not support such notifications. Likewise, Condor APIs to subscribe for notifications about match-making events (for example, to count jobs assigned to a machine) are missing. Accordingly, the less-than-ideal solution of polling the state of MDS/Condor to retrieve the required information must be used. Additionally, the Condor daemon and/or job logs can be monitored (and parsed) to observe relevant state changes.

Additional links and information