next up previous contents index
Next: 7.5 Grid Computing Up: 7. Frequently Asked Questions Previous: 7.3 Running Condor Jobs   Contents   Index


7.4 Condor on Windows

Will Condor work on a network of mixed Unix and Windows machines?

You can have a Condor pool that consists of both Unix and Windows machines.

Your central manager can be either Windows or Unix. For example, even if you had a pool consisting strictly of Unix machines, you could use a Windows box for your central manager, and vice versa.

Submitted jobs can originate from either a Windows or a Unix machine, and be destined to run on Windows or a Unix machine. Note that there are still restrictions on the supported universes for jobs executed on Windows machines.

So, in summary:

  1. A single Condor pool can consist of both Windows and Unix machines.

  2. It does not matter at all if your Central Manager is Unix or Windows.

  3. Unix machines can submit jobs to run on other Unix or Windows machines.

  4. Windows NT machines can submit jobs to run on other Windows or Unix machines.

What versions of Windows will Condor run on?

See Section 1.5, on page [*].

My Windows program works fine when executed on its own, but it does not work when submitted to Condor.

First, make sure that the program really does work outside of Condor under Windows, that the disk is not full, and that the system is not out of user resources.

As the next consideration, know that some Windows programs do not run properly because they are dynamically linked, and they cannot find the .dll files that they depend on. Version 6.4.x of Condor sets the PATH to be empty when running a job. To avoid these difficulties, do one of the following

  1. statically link the application
  2. wrap the job in a script that sets up the environment
  3. submit the job from a correctly-set environment with the command
    getenv = true
    in the submit description file. This will copy your environment into the job's environment.
  4. send the required .dll files along with the job using the submit description file command transfer_input_files.

Why is the condor_ master daemon failing to start, giving an error about
"In StartServiceCtrlDispatcher, Error number: 1063"?

In Condor for Windows, the condor_ master daemon is started as a service. Therefore, starting the condor_ master daemon as you would on Unix will not work. Start Condor on Windows machines using either
	net start condor
or start the Condor service from the Service Control Manager located in the Windows Control Panel.

Jobs submitted from Windows give an error referring to a credential.

Jobs submitted from a Windows machine require a stashed password in order for Condor to perform certain operations on the user's behalf. Refer to section 6.2.3 for information about password storage on Windows. The command which stashes a password for a user is condor_ store_cred. See the manual page on on page [*] for usage details.

The error message that Condor gives if a user has not stashed a password is of the form:

ERROR: No credential stored for username@machinename

        Correct this by running:
	        condor_store_cred add

Jobs submitted from Unix to execute on Windows do not work properly.

A difficulty with defaults causes jobs submitted from Unix for execution on a Windows platform to remain in the queue, but make no progress. For jobs with this problem, log files will contain error messages pointing to shadow exceptions.

This difficulty stems from the defaults for whether file transfer takes place. The workaround for this problem is to place the line

into the submit description file for jobs submitted from a Unix machine for execution on a Windows machine.

When I run condor_ status I get a communication error, or the Condor daemon log files report a failure to bind.

Condor uses the first network interface it sees on your machine. This problem usually means you have an extra, inactive network interface (such as a RAS dial up interface) defined before to your regular network interface.

To solve this problem, either change the order of your network interfaces in the Control Panel, or explicitly set which network interface Condor should use by adding the following parameter to your Condor configuration file:


Where ip-address is the IP address of the interface you wish Condor to use.

My job starts but exits right away with status 128.

This can occur when the machine your job is running on is missing a DLL (Dynamically Linked Library) required by your program. The solution is to find the DLL file the program needs and put it in the TRANSFER_INPUT_FILES list in the job's submit file.

To find out what DLLs your program depends on, right-click the program in Explorer, choose Quickview, and look under ``Import List''.

How can I access network files with Condor on Windows?

Five methods for making access of network files work with Condor are given in section 6.2.7.

What is wrong when condor_ off cannot find my host, and condor_ status does not give me a complete host name?

Given the command

  condor_off hostname2
an error message of the form
  Can't find address for master
appears. Yet, when looking at the host names with
  condor_status -master
the output is of the form

To correct this incomplete host name, add an entry to the configuration file for DEFAULT_DOMAIN_NAME that specifies the domain name to be used. For the example given, the configuration entry will be


After adding this configuration file entry, use condor_ restart to restart the Condor daemons and effect the change.

Does USER_JOB_WRAPPER work on Windows machines?

The USER_JOB_WRAPPER configuration variable does work on Windows machines. The wrapper must be either a batch script with a file extension of .bat or .cmd, or an executable with a file extension of .exe or .com.

An example of a batch script sets environment variables:

REM set some environment variables

REM Run the actual job now

condor_ store_cred is failing, and I'm sure I'm typing my password correctly.

First, make sure the condor_ schedd is running.

Next, check the SchedLog. It will contain more detailed information about the failure. Frequently, the error is a result of PERMISSION DENIED errors. You can read more about properly configuring security settings on page [*].

My submit machine cannot have more than 120 jobs running concurrently. Why?

Windows is likely to be running out of desktop heap. Confirm this to be the case by looking in the log for the condor_ schedd daemon to see if condor_ shadow daemons are immediately exiting with status 128. If this is the case, increase the desktop heap size. Open the registry key:

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\SubSystems\Window

The SharedSection value can have three values separated by commas. The third value controls the desktop heap size for non-interactive desktops, which the Condor service uses. The default is 512 (Kbytes). 60 condor_ shadow daemons consume about 256 Kbytes, hence 120 shadows can run with the default value. To be able to run a maximum of 300 condor_ shadow daemons, set this value at 1280.

Reboot the system for the changes to take effect. For more information, see Microsoft Article Q184802.

next up previous contents index
Next: 7.5 Grid Computing Up: 7. Frequently Asked Questions Previous: 7.3 Running Condor Jobs   Contents   Index