The following sections describe how to set up Condor for use in special environments or configurations. See section on page for installation instructions on the various Contrib modules that can be optionally downloaded and installed.
If you are using AFS at your site, be sure to read section 3.3.7 on ``Shared Filesystem Config Files Entries'' for details on configuring your machines to interact with and use shared filesystems, AFS in particular.
Condor does not currently have a way to authenticate itself to AFS. This is true of the Condor daemons that would like to authenticate as AFS user Condor, and the condor_ shadow, which would like to authenticate as the user who submitted the job it is serving. Since neither of these things can happen yet, there are a number of special things people who use AFS with Condor must do. Some of this must be done by the administrator(s) installing Condor. Some of this must be done by Condor users who submit jobs.
The most important thing is that since the Condor daemons can't authenticate to AFS, the LOCAL_DIR (and it's subdirectories like ``log'' and ``spool'') for each machine must be either writable to unauthenticated users, or must not be on AFS. The first option is a VERY bad security hole so you should NOT have your local directory on AFS. If you've got NFS installed as well and want to have your LOCAL_DIR for each machine on a shared file system, use NFS. Otherwise, you should put the LOCAL_DIR on a local partition on each machine in your pool. This means that you should run condor_ install to install your release directory and configure your pool, setting the LOCAL_DIR parameter to some local partition. When that's complete, log into each machine in your pool and run condor_ init to set up the local Condor directory.
The RELEASE_DIR, which holds all the Condor binaries, libraries and scripts can and probably should be on AFS. None of the Condor daemons need to write to these files, they just need to read them. So, you just have to make your RELEASE_DIR world readable and Condor will work just fine. This makes it easier to upgrade your binaries at a later date, which means that your users can find the Condor tools in a consistent location on all the machines in your pool, and that you can have the Condor config files in a centralized location. This is what we do at UW-Madison's CS department Condor pool and it works quite well.
Finally, you might want to setup some special AFS groups to help your users deal with Condor and AFS better (you'll want to read the section below anyway, since you're probably going to have to explain this stuff to your users). Basically, if you can, create an AFS group that contains all unauthenticated users but that is restricted to a given host or subnet. You're supposed to be able to make these host-based ACLs with AFS, but we've had some trouble getting that working here at UW-Madison. What we have instead is a special group for all machines in our department. So, the users here just have to make their output directories on AFS writable to any process running on any of our machines, instead of any process on any machine with AFS on the Internet.
The condor_ shadow process runs on the machine where you submitted your Condor jobs and performs all file system access for your jobs. Because this process isn't authenticated to AFS as the user who submitted the job, it will not normally be able to write any output. So, when you submit jobs, any directories where your job will be creating output files will need to be world writable (to non-authenticated AFS users). In addition, if your program writes to stdout or stderr, or you're using a user log for your jobs, those files will need to be in a directory that's world-writable.
Any input for your job, either the file you specify as input in your submit file, or any files your program opens explicitly, needs to be world-readable.
Some sites may have special AFS groups set up that can make this unauthenticated access to your files less scary. For example, there's supposed to be a way with AFS to grant access to any unauthenticated process on a given host. That way, you only have to grant write access to unauthenticated processes on your submit machine, instead of any unauthenticated process on the Internet. Similarly, unauthenticated read access could be granted only to processes running on your submit machine. Ask your AFS administrators about the existence of such AFS groups and details of how to use them.
The other solution to this problem is to just not use AFS at all. If you have disk space on your submit machine in a partition that is not on AFS, you can submit your jobs from there. While the condor_ shadow is not authenticated to AFS, it does run with the effective UID of the user who submitted the jobs. So, on a local (or NFS) file system, the condor_ shadow will be able to access your files normally, and you won't have to grant any special permissions to anyone other than yourself. If the Condor daemons are not started as root however, the shadow will not be able to run with your effective UID, and you'll have a similar problem as you would with files on AFS. See the section on ``Running Condor as Non-Root'' for details.
Beginning with Condor version 6.0.1, a single, global configuration file may be used for all platforms in a Condor pool, with only platform-specific settings placed in separate files. This greatly simplifies administration of a heterogeneous pool by allowing changes of platform-independent, global settings in one place, instead of separately for each platform. This is made possible by treating the LOCAL_CONFIG_FILE configuration variable as a list of files, instead of a single file. Of course, this only helps when using a shared file system for the machines in the pool, so that multiple machines can actually share a single set of configuration files.
With multiple platforms, put all platform-independent settings (the vast majority) into the regular condor_config file, which would be shared by all platforms. This global file would be the one that is found with the CONDOR_CONFIG environment variable, the user condor's home directory, or /etc/condor/condor_config.
Then set the LOCAL_CONFIG_FILE configuration variable from that global configuration file to specify both a platform-specific configuration file and optionally, a local, machine-specific configuration file (this parameter is described in section 3.3.3 on ``Condor-wide Configuration File Entries'').
The order of file specification in the LOCAL_CONFIG_FILE configuration variable is important, because settings in files at the beginning of the list are overridden if the same settings occur in files later within the list. So, if specifying the platform-specific file and then the machine-specific file, settings in the machine-specific file would override those in the platform-specific file (as is likely desired).
The name of platform-specific configuration files may be specified by using the ARCH and OPSYS parameters, as are defined automatically by Condor. For example, for Intel Linux machines, and Sparc Solaris 2.6 machines, the files ought to be named:
Then, assuming these three files are in the directory defined by the ETC configuration macro, and machine-specific configuration files are in the same directory, named by each machine's host name, the LOCAL_CONFIG_FILE configuration macro should be:
LOCAL_CONFIG_FILE = $(ETC)/condor_config.$(ARCH).$(OPSYS), \ $(ETC)/$(HOSTNAME).local
Alternatively, when using AFS, an ``@sys link'' may be used to specify the platform-specific configuration file, and let AFS resolve this link differently on different systems. For example, consider a soft link named condor_config.platform that points to condor_config.@sys. In this case, the files might be named:
condor_config.i386_linux2 condor_config.sun4x_56 condor_config.sgi_64 condor_config.platform -> condor_config.@sys
and the LOCAL_CONFIG_FILE configuration variable would be set to:
LOCAL_CONFIG_FILE = $(ETC)/condor_config.platform, \ $(ETC)/$(HOSTNAME).local
The configuration variables that are truly platform-specific are:
Reasonable defaults for all of these configuration variables will be found in the default configuration files inside a given platform's binary distribution (except the RELEASE_DIR, since the location of the Condor binaries and libraries is installation specific). With multiple platforms, use one of the condor_config files from either running condor_ install or from the <release_dir>/etc/examples/condor_config.generic file, take these settings out, save them into a platform-specific file, and install the resulting platform-independent file as the global configuration file. Then, find the same settings from the configuration files for any other platforms to be set up, and put them in their own platform-specific files. Finally, set the LOCAL_CONFIG_FILE configuration variable to point to the appropriate platform-specific file, as described above.
Not even all of these configuration variables are necessarily going to be different. For example, if an installed mail program understands the -s option in /usr/local/bin/mail on all platforms, the MAIL macro may be set to that in the global configuration file, and not define it anywhere else. For a pool with only Digital Unix, the DAEMON_LIST will be the same for each, so there is no reason not to put that in the global configuration file.
It is certainly possible that an installation may want other configuration variables to be platform-specific as well. Perhaps a different policy is desired for one of the platforms. Perhaps different people should get the e-mail about problems with the different platforms. There is nothing hard-coded about any of this. What is shared and what should not shared is entirely configurable.
Since the LOCAL_CONFIG_FILE macro can be an arbitrary list of files, an installation can even break up the global, platform-independent settings into separate files. In fact, the global configuration file might only contain a definition for LOCAL_CONFIG_FILE, and all other configuration variables would be placed in separate files.
Different people may be given different permissions to change different Condor settings. For example, if a user is to be able to change certain settings, but nothing else, those settings may be placed in a file which was early in the LOCAL_CONFIG_FILE list, to give that user write permission on that file, then include all the other files after that one. In this way, if the user was trying to change settings she/he should not, they would simply be overridden.
This mechanism is quite flexible and powerful. For very specific configuration needs, they can probably be met by using file permissions, the LOCAL_CONFIG_FILE configuration variable, and imagination.
In order to take advantage of two major Condor features: checkpointing and remote system calls, users of the Condor system need to relink their binaries. Programs that are not relinked for Condor can run in Condor's ``vanilla'' universe just fine, however, they cannot checkpoint and migrate, or run on machines without a shared filesystem.
To relink your programs with Condor, we provide a special tool, condor_ compile. As installed by default, condor_ compile works with the following commands: gcc, g++, g77, cc, acc, c89, CC, f77, fort77, ld. On Solaris and Digital Unix, f90 is also supported. See the condor_ compile(1) man page for details on using condor_ compile.
However, you can make condor_ compile work transparently with all commands on your system whatsoever, including make.
The basic idea here is to replace the system linker (ld) with the Condor linker. Then, when a program is to be linked, the condor linker figures out whether this binary will be for Condor, or for a normal binary. If it is to be a normal compile, the old ld is called. If this binary is to be linked for condor, the script performs the necessary operations in order to prepare a binary that can be used with condor. In order to differentiate between normal builds and condor builds, the user simply places condor_ compile before their build command, which sets the appropriate environment variable that lets the condor linker script know it needs to do its magic.
In order to perform this full installation of condor_ compile, the following steps need to be taken:
The actual commands that you must execute depend upon the system that you are on. The location of the system linker (ld), is as follows:
Operating System Location of ld (ld-path) Linux /usr/bin Solaris 2.X /usr/ccs/bin OSF/1 (Digital Unix) /usr/lib/cmplrs/cc
On these platforms, issue the following commands (as root), where ld-path is replaced by the path to your system's ld.
mv /[ld-path]/ld /[ld-path]/ld.real cp /usr/local/condor/lib/ld /[ld-path]/ld chown root /[ld-path]/ld chmod 755 /[ld-path]/ld
If you remove Condor from your system latter on, linking will continue to work, since the condor linker will always default to compiling normal binaries and simply call the real ld. In the interest of simplicity, it is recommended that you reverse the above changes by moving your ld.real linker back to it's former position as ld, overwriting the condor linker.
NOTE: If you ever upgrade your operating system after performing a full installation of condor_ compile, you will probably have to re-do all the steps outlined above. Generally speaking, new versions or patches of an operating system might replace the system ld binary, which would undo the full installation of condor_ compile.
The condor keyboard daemon (condor_ kbdd) monitors X events on machines where the operating system does not provide a way of monitoring the idle time of the keyboard or mouse. In particular, this is necessary on Digital Unix machines.
NOTE: If you are running on Solaris, Linux, or HP/UX, you do not need to use the keyboard daemon.
Although great measures have been taken to make this daemon as robust as possible, the X window system was not designed to facilitate such a need, and thus is less then optimal on machines where many users log in and out on the console frequently.
In order to work with X authority, the system by which X authorizes processes to connect to X servers, the condor keyboard daemon needs to run with super user privileges. Currently, the daemon assumes that X uses the HOME environment variable in order to locate a file named .Xauthority, which contains keys necessary to connect to an X server. The keyboard daemon attempts to set this environment variable to various users home directories in order to gain a connection to the X server and monitor events. This may fail to work on your system, if you are using a non-standard approach. If the keyboard daemon is not allowed to attach to the X server, the state of a machine may be incorrectly set to idle when a user is, in fact, using the machine.
In some environments, the keyboard daemon will not be able to connect to the X server because the user currently logged into the system keeps their authentication token for using the X server in a place that no local user on the current machine can get to. This may be the case if you are running AFS and have the user's X authority file in an AFS home directory. There may also be cases where you cannot run the daemon with super user privileges because of political reasons, but you would still like to be able to monitor X activity. In these cases, you will need to change your XDM configuration in order to start up the keyboard daemon with the permissions of the currently logging in user. Although your situation may differ, if you are running X11R6.3, you will probably want to edit the files in /usr/X11R6/lib/X11/xdm. The Xsession file should have the keyboard daemon startup at the end, and the Xreset file should have the keyboard daemon shutdown. As of patch level 4 of Condor version 6.0, the keyboard daemon has some additional command line options to facilitate this. The -l option can be used to write the daemons log file to a place where the user running the daemon has permission to write a file. We recommend something akin to $HOME/.kbdd.log since this is a place where every user can write and won't get in the way. The -pidfile and -k options allow for easy shutdown of the daemon by storing the process id in a file. You will need to add lines to your XDM config that look something like this:
condor_kbdd -l $HOME/.kbdd.log -pidfile $HOME/.kbdd.pid
This will start the keyboard daemon as the user who is currently logging in and write the log to a file in the directory $HOME/.kbdd.log/. Also, this will save the process id of the daemon to /.kbdd.pid, so that when the user logs out, XDM can simply do a:
condor_kbdd -k $HOME/.kbdd.pid
This will shutdown the process recorded in /.kbdd.pid and exit.
To see how well the keyboard daemon is working on your system, review the log for the daemon and look for successful connections to the X server. If you see none, you may have a situation where the keyboard daemon is unable to connect to your machines X server. If this happens, please send mail to firstname.lastname@example.org and let us know about your situation.
The CondorView server is an alternate use of the condor_ collector that logs information on disk, providing a persistent, historical database of pool state. This includes machine state, as well as the state of jobs submitted by users. Historical information logging can be turned on or off, so you can install the CondorView collector without using up disk space for historical information if you do not want it.
The CondorView collector is a condor_ collector that has been specially configured and is running on a different machine from the main condor_ collector. Unfortunately, installing the CondorView collector on a separate host generates more network traffic (from all the duplicate updates that are sent from each machine in your pool to both collectors).
The following sections describe how to configure a machine to run a CondorView server and to configure your pool to send updates to it.
To configure the CondorView collector, you have to add a few settings to the local configuration file of the chosen machine (a separate machine from the main condor_ collector) to enable historical data collection. These settings are described in detail in the Condor Version 6.8.7 Administrator's Manual, in section 3.3.17 on page . A short explanation of the entries you must customize is provided below.
NOTE: This directory should be separate and different from the spool or log directories already set up for Condor. There are a few problems putting these files into either of those directories.
Once these settings are in place in the local configuration file for your CondorView server host, you must to create the directory you specified in POOL_HISTORY_DIR and make it writable by the user your CondorView collector is running as. This is the same user that owns the CollectorLog file in your log directory. The user is usually condor.
After you've configured the CondorView attributes, you must configure Condor to automatically start the CondorView server. You do this by adding VIEW_SERVER to the DAEMON_LIST on this machine and defining what VIEW_SERVER means. For example:
VIEW_SERVER = $(SBIN)/condor_collector DAEMON_LIST = MASTER, STARTD, SCHEDD, VIEW_SERVERFor this change to take effect, you must re-start the condor_ master on this host (which you can do with the condor_ restart command, if you run the command from a machine with administrator access to your pool. (See section 3.6.10 on page for full details of IP/host-based security in Condor).
NOTE: Before you spawn the CondorView server by restarting your condor_ master, you should make sure CONDOR_VIEW_HOST is defined in your configuration (as described in the following section).
For the CondorView server to function, you must configure your pool to send updates to it. You do this by configuring your existing condor_ collector to forward its updates to the CondorView server. All the Condor daemons in the pool send their ClassAd updates to the regular condor_ collector, which in turn will forward them onto the CondorView server.
You do this by defining the following setting in your configuration file:
CONDOR_VIEW_HOST = full.hostnamewhere
full.hostnameis the full hostname of the machine where you are running your CondorView collector.
You should place this setting in your global configuration file, since it should be the same for both the main condor_ collector and the CondorView server. If you do not have a shared global configuration file for Condor, you should put the same value in the configuration files on both the main condor_ collector and the CondorView server host.
Once this setting is in place, you can finally restart the condor_ master at your CondorView server host (or spawn the condor_ master if it is not yet running). Once the CondorView server is running, you can finally send a condor_ reconfig to your main condor_ collector for the change to take effect so it will begin forwarding updates.
Condor jobs are formed from executables that are compiled to execute on specific platforms. This in turn restricts the machines within a Condor pool where a job may be executed. A Condor job may now be executed on a virtual machine system running VMware or Xen. This allows Windows executables to run on a Linux machine, and Linux executables to run on a Windows machine. These virtual machine systems exist for the Intel x86 architecture.
The term virtual machine used in this section is different than use of the term in other parts of Condor, due to the historical evolution of Condor. A virtual machine here describes the environment in which the outside operating system (called the host) emulates an inner operating system (called the inner virtual machine), such that an executable appears to run directly on the inner virtual machine. In other parts of Condor, a virtual machine refers to the multiple CPUs of an SMP machine, or refers to a Java virtual machine.
Under Xen or VMware, Condor has the flexibility to run a job on either the host or the inner virtual machine, hence two platforms appear to exist on a single machine. Since two platforms are an illusion, Condor understands the illusion, allowing a Condor job to be execute on only one at a time.
Condor must be separately installed, separately configured, and separately running on both the host and the inner virtual machine.
The configuration for the host specifies VMP_VM_LIST . This specifies host names or IP addresses of all inner virtual machines running on this host. An example configuration on the host machine:
VMP_VM_LIST = vmware1.domain.com, vmware2.domain.com
The configuration for each separate inner virtual machine specifies VMP_HOST_MACHINE . This specifies the host for the inner virtual machine. An example configuration on an inner virtual machine:
VMP_HOST_MACHINE = host.domain.com
Given this configuration, as well as communication between Condor daemons running on the host and on the inner virtual machine, the policy for when jobs may execute is set by Condor. While the host is executing a Condor job, the START policy on the inner virtual machine is overridden with False, so no Condor jobs will be started on the inner virtual machine. Conversely, while the inner virtual machine is executing a Condor job, the START policy on the host is overridden with False, so no Condor jobs will be started on the host.
The inner virtual machine is further provided with a new syntax for referring to the machine ClassAd attributes of its host. Any machine ClassAd attribute with a prefix of the string HOST_ explicitly refers to the host's ClassAd attributes. The START policy on the inner virtual machine ought to use this syntax to avoid starting jobs when its host is too busy processing other items. An example configuration for START on an inner virtual machine:
START = ( (KeyboardIdle > 150 ) && ( HOST_KeyboardIdle > 150 ) \ && ( LoadAvg <= 0.3 ) && ( HOST_TotalLoadAvg <= 0.3 ) )
This section describes how to configure the condor_ startd for SMP (Symmetric Multi-Processor) machines. Beginning with Condor version 6.1, machines with more than one CPU can be configured to run more than one job at a time. As always, owners of the resources have great flexibility in defining the policy under which multiple jobs may run, suspend, vacate, etc.
The way SMP machines are represented to the Condor system is that the shared resources are broken up into individual virtual machines (each virtual machine is called a VM). Each VM can be matched and claimed by users. Each VM is represented by an individual ClassAd (see the ClassAd reference, section 4.1, for details). In this way, each SMP machine will appear to the Condor system as a collection of separate VMs. As an example, an SMP machine named vulture.cs.wisc.edu would appear to Condor as the multiple machines, named email@example.com, firstname.lastname@example.org, email@example.com, and so on.
The way that the condor_ startd breaks up the shared system resources into the different virtual machines is configurable. All shared system resources (like RAM, disk space, swap space, etc.) can either be divided evenly among all the virtual machines, with each CPU getting its own virtual machine, or you can define your own virtual machine types, so that resources can be unevenly partitioned. Regardless of the partioning scheme used, is important to remember the goal is to create a representative virtual machine ClassAd, to be used for matchmaking with jobs. Condor does not directly enforce virtual machine shared resource allocations, and jobs are free to oversubscribe to shared resources.
Consider an example where two VMs are each defined with 50%of available RAM. The resultant ClassAd for each VM will advertise one half the available RAM. Users may submit jobs with RAM requirements that match these VMs. However, jobs run on either VM are free to consume more than 50%of available RAM. Condor will not directly enforce a RAM utilization limit on either VM. If a shared resource enforcement capability is needed, it is possible to write a Startd policy that will evict a job that oversubscribes to shared resources, see section 3.12.7.
The following section gives details on how to configure Condor to divide the resources on an SMP machine into separate virtual machines.
This section describes the settings that allow you to define your own virtual machine types and to control how many virtual machines of each type are reported to Condor.
There are two main ways to go about partitioning an SMP machine:
Beginning with Condor version 6.1.6, the number of each type being reported can be changed at run-time, by issuing a reconfiguration command to the condor_ startd daemon (sending a SIGHUP or using condor_ reconfig). However, the definitions for the types themselves cannot be changed with reconfiguration. If you change any VM type definitions, you must use condor_ restart
condor_restart -startdfor that change to take effect.
To define your own virtual machine types, add configuration file
parameters that list how much of each system resource you want in the
given VM type. Do this by defining configuration
variables of the form
<N> represents an integer (for example,
VIRTUAL_MACHINE_TYPE_1), which specifies the virtual
machine type defined.
This number is used to configure how many VMs of this type
A type describes what share of the total system resources a given virtual machine has available to it.
The type can be defined by:
The attributes that specify the number of CPUs and the total amount of RAM in the SMP machine do not change. For these attributes, specify either absolute values or percentages of the total available amount. For example, in a machine with 128 Mbytes of RAM, all the following definitions result in the same allocation amount.
mem=64 mem=1/2 mem=50%
Other attributes are dynamic, such as disk space and swap space. For these, specify a percentage or fraction of the total value that is allocated to each VM, instead of specifying absolute values. As the total values of these resources change on your machine, each VM will take its fraction of the total and report that as its available amount.
The four attribute names are case insensitive when defining VM types. The first letter of the attribute name distinguishes between the attributes. The four attributes, with several examples of acceptable names for each are
As an example, consider a host of 4 CPUs and 256 megs of RAM. Here are valid example VM type definitions. Types 1-3 are all equivalent to each other, as are types 4-6
VIRTUAL_MACHINE_TYPE_1 = cpus=2, ram=128, swap=25%, disk=1/2
VIRTUAL_MACHINE_TYPE_2 = cpus=1/2, memory=128, virt=25%, disk=50%
VIRTUAL_MACHINE_TYPE_3 = c=1/2, m=50%, v=1/4, disk=1/2
VIRTUAL_MACHINE_TYPE_4 = c=25%, m=64, v=1/4, d=25%
VIRTUAL_MACHINE_TYPE_5 = 25%
VIRTUAL_MACHINE_TYPE_6 = 1/4
The number of virtual machines of each type is set with the configuration variable NUM_VIRTUAL_MACHINES_TYPE_<N> , where N is the type as given in the VIRTUAL_MACHINE_TYPE_<N>variable.
Note that it is possible to set the configuration variables such that they specify an impossible configuration. If this occurs, the condor_ startd daemon fails after writing a message to its log attempting to indicate the configuration requirements that it could not implement.
If you are not defining your own VM types, then all resources are divided equally among the VMs. The number of VMs within the SMP machine is the only attribute that needs to be defined. Its definition is accomplished by setting the configuration variable NUM_VIRTUAL_MACHINES to the integer number of machines desired. If variable NUM_VIRTUAL_MACHINES is not defined, it defaults to the number of CPUs within the SMP machine.
Section 3.5 details the Startd Policy Configuration. This section continues the discussion with respect to SMP machines.
Each virtual machine within an SMP machine is treated as an independent machine, each with its own view of its machine state. There is a single set of policy expressions for the SMP machine as a whole. This policy may consider the VM state(s) in its expressions. This makes some policies easy to set, but it makes other policies difficult or impossible to set.
An easy policy to set configures how many of the virtual machines notice console or tty activity on the SMP as a whole. VMs that are not configured to notice any activity will report ConsoleIdle and KeyboardIdle times from when the condor_ startd daemon was started, (plus a configurable number of seconds). With this, you can set up a multiple CPU machine with the default policy settings plus add that the keyboard and console noticed by only one virtual machine. Assuming a reasonable load average (see section 3.12.7 below on ``Load Average for SMP Machines''), only the one virtual machine will suspend or vacate its job when the owner starts typing at their machine again. The rest of the virtual machines could be matched with jobs and leave them running, even while the user was interactively using the machine. If the default policy is used, all virtual machines notice tty and console activity and currently running jobs would suspend or preempt.
This example policy is controlled with the following configuration variables.
These configuration variables are fully described in section 3.3.10 on page which lists all the configuration file settings for the condor_ startd.
The configuration of virtual machines allows each VM to advertise its own machine ClassAd. Yet, there is only one set of policy expressions for the SMP machine as a whole. This makes the implementation of certain types of policies impossible. While evaluating the state of one VM (within the SMP machine), the state of other VMs (again within the SMP machine) are not available. Decisions for one VM cannot be based on what other machines within the SMP are doing.
Specifically, the evaluation of a VM policy expression works in the following way.
To set a different policy for the VMs within an SMP machine,
SUSPEND) policy will be of the form
SUSPEND = ( (VirtualMachineID == 1) && (PolicyForVM1) ) || \ ( (VirtualMachineID == 2) && (PolicyForVM2) )where
(PolicyForVM2)are the desired expressions for each VM.
Most operating systems define the load average for an SMP machine as the total load on all CPUs. For example, if you have a 4-CPU machine with 3 CPU-bound processes running at the same time, the load would be 3.0 In Condor, we maintain this view of the total load average and publish it in all resource ClassAds as TotalLoadAvg.
Condor also provides a per-CPU load average for SMP machines. This nicely represents the model that each node on an SMP is a virtual machine, separate from the other nodes. All of the default, single-CPU policy expressions can be used directly on SMP machines, without modification, since the LoadAvg and CondorLoadAvg attributes are the per-virtual machine versions, not the total, SMP-wide versions.
The per-CPU load average on SMP machines is a Condor invention. No system call exists to ask the operating system for this value. Condor already computes the load average generated by Condor on each virtual machine. It does this by close monitoring of all processes spawned by any of the Condor daemons, even ones that are orphaned and then inherited by init. This Condor load average per virtual machine is reported as the attribute CondorLoadAvg in all resource ClassAds, and the total Condor load average for the entire machine is reported as TotalCondorLoadAvg. The total, system-wide load average for the entire machine is reported as TotalLoadAvg. Basically, Condor walks through all the virtual machines and assigns out portions of the total load average to each one. First, Condor assigns the known Condor load average to each node that is generating load. If there's any load average left in the total system load, it is considered an owner load. Any virtual machines Condor believes are in the Owner state (like ones that have keyboard activity), are the first to get assigned this owner load. Condor hands out owner load in increments of at most 1.0, so generally speaking, no virtual machine has a load average above 1.0. If Condor runs out of total load average before it runs out of virtual machines, all the remaining machines believe that they have no load average at all. If, instead, Condor runs out of virtual machines and it still has owner load remaining, Condor starts assigning that load to Condor nodes as well, giving individual nodes with a load average higher than 1.0.
This section describes how the condor_ startd daemon handles its debugging messages for SMP machines. In general, a given log message will either be something that is machine-wide (like reporting the total system load average), or it will be specific to a given virtual machine. Any log entrees specific to a virtual machine have an extra header printed out in the entry: vm#:. So, for example, here's the output about system resources that are being gathered (with D_ FULLDEBUG and D_ LOAD turned on) on a 2-CPU machine with no Condor activity, and the keyboard connected to both virtual machines:
11/25 18:15 Swap space: 131064 11/25 18:15 number of kbytes available for (/home/condor/execute): 1345063 11/25 18:15 Looking up RESERVED_DISK parameter 11/25 18:15 Reserving 5120 kbytes for file system 11/25 18:15 Disk space: 1339943 11/25 18:15 Load avg: 0.340000 0.800000 1.170000 11/25 18:15 Idle Time: user= 0 , console= 4 seconds 11/25 18:15 SystemLoad: 0.340 TotalCondorLoad: 0.000 TotalOwnerLoad: 0.340 11/25 18:15 vm1: Idle time: Keyboard: 0 Console: 4 11/25 18:15 vm1: SystemLoad: 0.340 CondorLoad: 0.000 OwnerLoad: 0.340 11/25 18:15 vm2: Idle time: Keyboard: 0 Console: 4 11/25 18:15 vm2: SystemLoad: 0.000 CondorLoad: 0.000 OwnerLoad: 0.000 11/25 18:15 vm1: State: Owner Activity: Idle 11/25 18:15 vm2: State: Owner Activity: Idle
If, on the other hand, this machine only had one virtual machine connected to the keyboard and console, and the other VM was running a job, it might look something like this:
11/25 18:19 Load avg: 1.250000 0.910000 1.090000 11/25 18:19 Idle Time: user= 0 , console= 0 seconds 11/25 18:19 SystemLoad: 1.250 TotalCondorLoad: 0.996 TotalOwnerLoad: 0.254 11/25 18:19 vm1: Idle time: Keyboard: 0 Console: 0 11/25 18:19 vm1: SystemLoad: 0.254 CondorLoad: 0.000 OwnerLoad: 0.254 11/25 18:19 vm2: Idle time: Keyboard: 1496 Console: 1496 11/25 18:19 vm2: SystemLoad: 0.996 CondorLoad: 0.996 OwnerLoad: 0.000 11/25 18:19 vm1: State: Owner Activity: Idle 11/25 18:19 vm2: State: Claimed Activity: Busy
As you can see, shared system resources are printed without the header (like total swap space), and VM-specific messages (like the load average or state of each VM) get the special header appended.
The STARTD_ATTRS (and legacy STARTD_EXPRS) settings can be configured on a per-VM basis. The condor_ startd daemon builds the list of items to advertise by combining the lists in this order:
For example, consider the following configuration:
STARTD_EXPRS = favorite_color, favorite_season VM1_STARTD_EXRS = favorite_movie VM2_STARTD_EXPRS = favorite_song
This will result in the condor_ startd ClassAd for VM1 defining values for favorite_color, favorite_season, and favorite_movie. VM2 will have values for favorite_color, favorite_season, and favorite_song.
Attributes themselves in the STARTD_EXPRS and STARTD_ATTRS list can also be defined on a per-VM basis. Here is another example:
favorite_color = "blue" favorite_season = "spring" STARTD_EXPRS = favorite_color, favorite_season VM2_favorite_color = "green" VM3_favorite_season = "summer"
For this example, the condor_ startd ClassAds are
favorite_color = "blue" favorite_season = "spring"
favorite_color = "green" favorite_season = "spring"
favorite_color = "blue" favorite_season = "summer"
Applications that require multiple resources, yet must not be preempted, are handled gracefully by Condor. Condor combines opportunistic scheduling and dedicated scheduling within a single system. Opportunistic scheduling involves placing a job on a non-dedicated resource under the assumption that the resource may not be available for the entire duration of the job. Dedicated scheduling assumes the constant availability of resources; it is assumed that the job will run to completion, without interruption.
To support applications needing dedicated resources, an administrator configures resources to be dedicated. These resources are controlled by a dedicated scheduler, a single machine within the pool that runs a condor_ schedd daemon. There is no limit on the number of dedicated schedulers within a Condor pool. However, each dedicated resource may only be managed by a single dedicated scheduler. Running multiple dedicated schedulers within a single pool results in a fragmentation of dedicated resources. This can create a situation where jobs cannot run, because there are too few resource that may be allocated.
After a condor_ schedd daemon has been selected as the dedicated scheduler for the pool and resources are configured to be dedicated, users submit parallel universe jobs (including MPI applications) through that condor_ schedd daemon. When an idle parallel universe job is found in the queue, this dedicated scheduler performs its own scheduling algorithm to find and claim appropriate resources for the job. When a resource can no longer be used to serve a job that must not be preempted, the resource is allowed to run opportunistic jobs.
We recommend that you select a single machine within a Condor pool to act as the dedicated scheduler. This becomes the machine from upon which all users submit their parallel universe jobs. The perfect choice for the dedicated scheduler is the single, front-end machine for a dedicated cluster of compute nodes. For the pool without an obvious choice for a submit machine, choose a machine that all users can log into, as well as one that is likely to be up and running all the time. All of Condor's other resource requirements for a submit machine apply to this machine, such as having enough disk space in the spool directory to hold jobs. See section 3.2.2 on page for details on these issues.
Each machine may have its own policy for the execution of jobs. This policy is set by configuration. Each machine with aspects of its configuration that are dedicated identifies the dedicated scheduler. And, the ClassAd representing a job to be executed on one or more of these dedicated machines includes an identifying attribute. An example configuration file with the following various policy settings is /etc/condor_config.local.dedicated.resource.
Each dedicated machine defines the configuration variable DedicatedScheduler , which identifies the dedicated scheduler it is managed by. The local configuration file for any dedicated resource contains a modified form of
DedicatedScheduler = "DedicatedScheduler@full.host.name" STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler
Substitute the host name of the dedicated scheduler
machine for the string "
If running personal Condor, the name of the scheduler includes the user name it was started as, so the configuration appears as:
DedicatedScheduler = "DedicatedScheduler@firstname.lastname@example.org" STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler
All dedicated resources must have policy expressions which allow for jobs to always run, but not be preempted. The resource must also be configured to prefer jobs from the dedicated scheduler over all other jobs. Therefore, configuration gives the dedicated scheduler of choice the highest rank. It is worth noting that Condor puts no other requirements on a resource for it to be considered dedicated.
Job ClassAds from the dedicated scheduler contain the attribute Scheduler. dedicated scheduler. The attribute is defined by a string of the form
Scheduler = "DedicatedScheduler@full.host.name"The host name of the dedicated scheduler substitutes for the string "
Different resources in the pool may have different dedicated policies by varying the local configuration.
One possible scenario for the use of a dedicated resource is to only run jobs that require the dedicated resource. To enact this policy, the configure with the following expressions:
START = Scheduler =?= $(DedicatedScheduler) SUSPEND = False CONTINUE = True PREEMPT = False KILL = False WANT_SUSPEND = False WANT_VACATE = False RANK = Scheduler =?= $(DedicatedScheduler)
The START expression specifies that a job with the Scheduler attribute must match the string corresponding DedicatedScheduler attribute in the machine ClassAd. The RANK expression specifies that this same job (with the Scheduler attribute) has the highest rank. This prevents other jobs from preempting it based on user priorities. The rest of the expressions disable all of the condor_ startd daemon's regular policies for evicting jobs when keyboard and CPU activity is discovered on the machine.
While the first example works nicely for jobs requiring dedicated resources, it can lead to poor utilization of the dedicated machines. A more sophisticated strategy allows the machines to run other jobs, when no jobs that require dedicated resources exist. The machine is configured to prefer jobs that require dedicated resources, but not prevent others from running.
To implement this, configure the machine as a dedicated resource (as above) modifying only the START expression:
START = True
A third policy example allows all jobs. These desk-top machines use a preexisting START expression that takes the machine owner's usage into account for some jobs. The machine does not preempt jobs that must run on dedicated resources, while it will preempt other jobs based on a previously set policy. So, the default pool policy is used for starting and stopping jobs, while jobs that require a dedicated resource always start and are not preempted.
The START, SUSPEND, PREEMPT, and RANK policies are set in the global configuration. Locally, the configuration is modified to this hybrid policy by adding a second case.
SUSPEND = Scheduler =!= $(DedicatedScheduler) && ($(SUSPEND)) PREEMPT = Scheduler =!= $(DedicatedScheduler) && ($(PREEMPT)) RANK_FACTOR = 1000000 RANK = (Scheduler =?= $(DedicatedScheduler) * $(RANK_FACTOR)) \ + $(RANK) START = (Scheduler =?= $(DedicatedScheduler)) || ($(START))
Define RANK_FACTOR to be a larger value than the maximum value possible for the existing rank expression. RANK is just a floating point value, so there is no harm in having a value that is very large.
In some parallel environments, machines are divided into groups, and jobs should not cross groups of machines - that is, all the nodes of a parallel job should be allocated to machines within the same group. The most common example is a pool of machines using infiniband switches. Each switch might connect 16 machines, and a pool might have 160 machines on 10 switches. If the infiniband switches are not routed to each other, each job must run on machines connected to the same switch.
The dedicated scheduler's parallel scheduling groups features supports jobs that must not cross group boundaries. Define a group by having each machine within a group set the configuration variable ParallelSchedulingGroup with a string that is a unique name for the group. The submit description file for a parallel universe job which must not cross group boundaries contains
+WantParallelSchedulingGroups = True
The dedicated scheduler enforces the allocation to within a group.
The dedicated scheduler can optionally preempt running MPI jobs in favor of higher priority MPI jobs in its queue. Note that this is different from preemption in non-parallel universes, and MPI jobs cannot be preempted either by a machine's user pressing a key or by other means.
By default, the dedicated scheduler will never preempt running MPI jobs. Two configuration file items control dedicated preemption: SCHEDD_PREEMPTION_REQUIREMENTS and SCHEDD_PREEMPTION_RANK . These have no default value, so if either are not defined, preemption will never occur. SCHEDD_PREEMPTION_REQUIREMENTS must evaluate to True for a machine to be a candidate for this kind of preemption. If more machines are candidates for preemption than needed to satisfy a higher priority job, the machines are sorted by SCHEDD_PREEMPTION_RANK, and only the highest ranked machines are taken.
Note that preempting one node of a running MPI job requires killing the entire job on all of its nodes. So, when preemption happens, it may end up freeing more machines than strictly speaking are needed. Also, as Condor cannot produce checkpoints for MPI jobs, preempted jobs will be re-run, starting again from the beginning. Thus, the administrator should be careful when enabling dedicated preemption. The following example shows how to enable dedicated preemption.
STARTD_JOB_EXPRS = JobPrio SCHEDD_PREEMPTION_REQUIREMENTS = (My.JobPrio < Target.JobPrio) SCHEDD_PREEMPTION_RANK = 0.0
In this case, preemption is enabled by the user job priority. If a set of machines is running a job at user priority 5, and the user submits a new job at user priority 10, the running job will be preempted for the new job. The old job is put back in the queue, and will begin again from the beginning when assigned to a new set of machines.
In some parallel environments, machines are divided into groups, and jobs should not cross groups of machines - that is, all the nodes of a parallel job should be allocated to machines in the same group. The most common example is a pool of machine using infiniband switches. Each switch might connect 16 machines, and a pool might have 160 machines on 10 switches. If the infiniband switches are not routed to each other, each job must run on machines connected to the same switch. The dedicated scheduler's parallel scheduling groups features supports this operation.
Each startd must define which group it belongs to by setting the ParallelSchedulingGroup property in the config file, and advertising it into the machine classad. The value of this property is simply a string, which should be the same for all startds in a given group. The property must be advertised in the startd job ad by appending ParallelSchedulingGroup into the STARTD_EXPRS configuration variable. Then, parallel jobs which want to be scheduled by group, declare this in their submit file by setting +WantParallelSchedulingGroups=True.
Beginning with Condor version 6.7.17, Condor can be configured to run backfill jobs whenever the condor_ startd has no other work to perform. These jobs are considered the lowest possible priority, but when machines would otherwise be idle, the resources can be put to good use.
Currently, Condor only supports using the Berkeley Open Infrastructure for Network Computing (BOINC) to provide the backfill jobs. More information about BOINC is available at http://boinc.berkeley.edu. Furthermore, Condor currently does not support backfill jobs on windows machines.
The rest of this section will provide an overview of how backfill jobs work in Condor, details for configuring the policy for when backfill jobs are started or killed, and details on how to configure Condor to spawn the BOINC client to perform the work.
Whenever a resource controlled by Condor is in the Unclaimed/Idle state, it is totally idle: neither the interactive user nor a Condor job is performing any work. Machines in this state can be configured to enter the Backfill state, which means the resource will attempt to perform a background computation to keep itself busy until other work arrives (either a user returning to use the machine interactively, or a normal Condor job). Once a resource enters the Backfill state, the condor_ startd will attempt to spawn another program, called a backfill client, to launch and manage the backfill computation. When other work arrives, the condor_ startd will kill the backfill client and clean up any processes it has spawned, freeing the machine resources for the new, higher priority task. More details about the different states a Condor resource can enter and all of the possible transitions between them are described in section 3.5 beginning on page , especially sections 3.5.6, 3.5.7, and 3.5.8.
At this point, the only backfill system supported by Condor is BOINC. The condor_ startd has the ability to start and stop the BOINC client program at the appropriate times, but otherwise provides no additional services to configure the BOINC computations themselves. Future versions of Condor might provide additional functionality to make it easier to manage BOINC computations from within the Condor configuration settings. For now, the BOINC client must be manually installed and configured outside of Condor on each backfill-enabled machine.
There are a small set of policy expressions that determine if a condor_ startd will attempt to spawn backfill jobs at all, and if so, to control the transitions in to and out of the Backfill state. This section briefly lists these expressions. More detail can be found in section 3.3.10 on page .
The following examples show some possible uses of these settings:
# Turn on backfill functionality, and use BOINC ENABLE_BACKFILL = TRUE BACKFILL_SYSTEM = BOINC # Spawn a backfill job if we've been Unclaimed for more than 5 # minutes START_BACKFILL = $(StateTimer) > (5 * $(MINUTE)) # Evict a backfill job if the machine is busy (based on keyboard # activity or cpu load) EVICT_BACKFILL = $(MachineBusy)
The BOINC system is a distributed computing environment for solving large scale scientific problems. A detailed explanation of this system is beyond the scope of this manual. Thorough documentation about BOINC is available at their website: http://boinc.berkeley.edu. However, a brief overview is provided here for sites interested in using BOINC with Condor to manage backfill jobs.
BOINC grew out of the relatively famous SETI@home computation, where volunteers would install special client software (in the form of a screen saver) that would contact a centralized server to download work units. Each work unit contained a set of radio telescope data and the computation tried to find patterns in the data, a sign of intelligent life elsewhere in the universe (hence the name: ``Search for Extra Terrestrial Intelligence at home''). BOINC is developed by the Space Sciences Lab at the University of California, Berkeley, by the same people who created SETI@home. However, instead of being tied to the specific radio telescope application, BOINC is a generic infrastructure where many different kinds of scientific computations can be solved. The current generation of SETI@home now runs on top of BOINC, along with various physics, biology, climatology, and other applications.
The basic computational model for BOINC and the original SETI@home is the same: volunteers install BOINC client software which will run whenever the machine would otherwise be idle. However, the BOINC installation on any given machine must be configured so that it knows what computations to work for (each computation is referred to as a project using BOINC's terminology), instead of always working on a hard coded computation. A given BOINC client can be configured to donate all of its cycles to a single project, or to split the cycles between projects so that, on average, the desired percentage of the computational power is allocated to each project. Once the client software (a program called the boinc_client) starts running, it will attempt to contact a centralized server for each project it has been configured to work for. The BOINC software will download the appropriate platform-specific application binary and some work units from the central server for each project. Whenever the client software completes a given work unit, it will once again attempt to connect to that project's central server to upload the results and download more work.
BOINC participants must register at the centralized server for each project they wish to donate cycles to. The process produces a unique identifier so that the work performed by a given client can be credited to a specific user. BOINC keeps track of the work units completed by each user, so that users providing the most cycles get the highest rankings (and therefore, bragging rights).
Because BOINC already handles the problems of distributing the application binaries for each scientific computation, the work units, and compiling the results, it is a perfect system for managing backfill computations in Condor. Many of the applications that run on top of BOINC do their own application-specific checkpointing, so even if the boinc_client is killed (for example, when a Condor job arrives at a machine, or if the interactive user returns) an entire work unit won't necessarily be lost.
If a working installation of BOINC currently exists on machines where backfill is desired, skip the remainder of this section. Continue reading with the section titled ``Configuring the BOINC client under Condor''.
In Condor Version 6.8.7, the BOINC client software that actually spawns and manages the backfill computations (the boinc_client) must be manually downloaded, installed and configured outside of Condor. Hopefully in future versions, the Condor package will include the boinc_client, and there will be a way to automatically install and configure the BOINC software together with Condor.
The boinc_client executables can be obtained at one of the following locations:
Once the BOINC client software has been downloaded, the boinc_client binary should be placed in a location where the Condor daemons can use it. The path will be specified via a Condor configuration setting, BOINC_Executable , described below.
Additionally, a local directory on each machine should be created where the BOINC system can write files it needs. This directory must not be shared by multiple instances of the BOINC software, just like the spool or execute directories used by Condor. This location of this directory is defined using the BOINC_InitialDir macro, described below. The directory must be writable by whatever user the boinc_client will run as. This user is either the same as the user the Condor daemons are running as (if Condor is not running as root), or a user defined via the BOINC_Owner setting described below.
Finally, Condor administrators wishing to use BOINC for backfill jobs must create accounts at the various BOINC projects they want to donate cycles to. The details of this process vary from project to project. Beware that this step must be done manually, as the BOINC software spawned by Condor (the boinc_client) can not automatically register a user at a given project (unlike the more fancy GUI version of the BOINC client software which many users run as a screen saver). For example, to configure machines to perform work for the Einstein@home project (a physics experiment run by the University of Wisconsin at Milwaukee) Condor administrators should go to http://einstein.phys.uwm.edu/create_account_form.php, fill in the web form, and generate a new Einstein@home identity. This identity takes the form of a project URL (such as http://einstein.phys.uwm.edu) followed by an account key, which is a long string of letters and numbers that is used as a unique identifier. This URL and account key will be needed when configuring Condor to use BOINC for backfill computations (described in the next section).
This section assumes that the BOINC client software has already been installed on a given machine, that the BOINC projects to join have been selected, and that a unique project account key has been created for each project. If any of these steps has not been completed, please read the previous section titled ``Installing the BOINC client software''
Whenever the condor_ startd decides to spawn the boinc_client to perform backfill computations (when ENABLE_BACKFILL is True, when the resource is in Unclaimed/Idle, and when the START_BACKFILL expression evaluates to True), it will spawn a condor_ starter to directly launch and monitor the boinc_client program. This condor_ starter is just like the one used to spawn normal Condor jobs. In fact, the argv of the boinc_client will be renamed to ``condor_ exec'', as described in section 2.15.1 on page .
The condor_ starter for spawning the boinc_client reads values out of the Condor configuration files to define the job it should run, as opposed to getting these values from a job classified ad in the case of a normal Condor job. All of the configuration settings to control things like the path to the boinc_client binary to use, the command-line arguments, the initial working directory, and so on, are prefixed with the string "BOINC_". Each possible setting is described below:
BOINC_Arguments = --attach_project http://einstein.phys.uwm.edu [account_key]
The following example shows one possible usage of these settings:
# Define a shared macro that can be used to define other settings. # This directory must be manually created before attempting to run # any backfill jobs. BOINC_HOME = $(LOCAL_DIR)/boinc # Path to the boinc_client to use, and required universe setting BOINC_Executable = /usr/local/bin/boinc_client BOINC_Universe = vanilla # What initial working directory should BOINC use? BOINC_InitialDir = $(BOINC_HOME) # Save STDOUT and STDERR BOINC_Output = $(BOINC_HOME)/boinc.out BOINC_Error = $(BOINC_HOME)/boinc.err
If the Condor daemons reading this configuration are running as root, an additional macro must be defined:
# Specify the user that the boinc_client should run as: BOINC_Owner = nobody
In this case, Condor would spawn the boinc_client as ``nobody'', so the directory specified in $(BOINC_HOME) would have to be writable by the ``nobody'' user.
A better choice would probably be to create a separate user account just for running BOINC jobs, so that the local BOINC installation is not writable by other processes running as ``nobody''. Alternatively, the BOINC_Owner could be set to ``daemon''.
Attaching to a specific BOINC project
There are a few ways to attach a Condor/BOINC installation to a given BOINC project:
<account> <master_url>[URL]</master_url> <authenticator>[key]</authenticator> </account>
<account> <master_url>http://einstein.phys.uwm.edu</master_url> <authenticator>aaaa1111bbbb2222cccc3333</authenticator> </account>
(Of course, the
<authenticator> tag would use the real
authentication key returned when the account was created at a given
These account files can be copied to the local BOINC directory on all machines in a Condor pool, so administrators can either distribute them manually, or use symbolic links to point to a shared file system.
In the first two cases (using command-line arguments for boinc_client or running the boinc_cmd tool), BOINC will write out the resulting account file to the local BOINC directory on the machine, and then future invocations of the boinc_client will already be attached to the appropriate project(s). More information about participating in multiple BOINC projects can be found at http://boinc.berkeley.edu/multiple_projects.php.