Tuesday, 29 June 2010

Running a command/script in all machines in a Condor pool

Sadly, all the machines in our Condor pool do not have exactly the same configuration, so it is sometimes convenient to run a command/script  in all the machines (mostly to find misbehaving ones). Since there is apparently no built-in way of doing this with Condor, I just wrote some quick Perl scripts to do it (in this case to find which machines cannot run Octave.

cycle_all_condor.pl will take a constraint string and a basic Condor command file. For example, I will run it like:

[angelv]$ ./cycle_all_condor.pl 'Arch == "X86_64"' basic.file

in order to find all the 64 bits machines in our pool, and will use basic.file as the prefix to the resulting Condor sumbit file. The resulting condor submit file is just sent to standard output (which can be redirected to a file, and then submitted with Condor).

The scripts are very simple, and probably written in a very poor Perl, but they do the job. Here they go:

[angelv]$ cat cycle_all_condor.pl
#! /usr/bin/perl                                                                                                                                          

$constraints = @ARGV[0];
$basic_cmd_file = @ARGV[1];

system "cat $basic_cmd_file";

open (MACHINES, "condor_status -const '$constraints' -format \"%s\n\" Machine | sort -u |");

while (<MACHINES>) {
    print "Requirements = Machine == \"$_\" \n";
    print "Queue \n"

close (MACHINES);

[angelv]$ cat basic.file
Executable = octave.sh
Universe = vanilla
output = output.$(Process)
error = error.$(Process)
log = log.condor
notification = NEVER
getenv = True

[angelv]$ cat octave.sh
uname -a
octave --help

[angelv]$ cat results.pl

@files = <error*>;
foreach $file (@files) {
    if (-s $file) {
        ($name,$ext) = split(/\./,$file);
        print "$file \n";
        system "cat $file output.$ext \n";

No comments: