From: David Holland <dholland AT eecs.harvard.edu>
Date: Fri, 1 Aug 2008 21:14:19 -0400
| Threading: | ↑ [provenance-challenge] Submission of workflows for Provenance Challenge 3 from pgroth AT isi.edu • This Message |
On Fri, Aug 01, 2008 at 02:31:36PM -0700, Paul Groth wrote: > At the Open Provenance Model Workshop, we had agreed that it would be > advantageous for the Third Provenance Challenge to have additional > workflows beyond the Brain Atlas workflow used in previous challenges. > To that end, it was decided that a number of teams would propose new > workflows by August 1. I've been looking at preparing a small compile workload. This turns out to not be entirely trivial; most compiles are very large compared to what most groups are prepared to cope with in this context, and really small ones are inherently not very interesting. There are also some issues with portability, and also, most compiles already have a workflow engine (make) and interesting ones tend to generate parts of the makefile on the fly. This makes replacing the makefile with a workflow specification problematic, but just having the workflow engine run make turns the whole thing into one big glob. I think the solution to this is to have the workload specification know what the outputs are going to be and ask successive make runs to deliver them one at a time. It is not all that natural, but it should serve. Along these lines, I have a small workload that's a drastically cut down build of the toy kernel we use for teaching our OS course. It currently has five phases: tree configure, kernel configure, make depend, compile, and link. - tree configure runs a "configure" script to generate ↵ "treedefs.mk", which is used by all later make invocations. - kernel configure runs the kernel config script on a kernel config file and a sources list to generate these files: autoconf.c autoconf.h opt-sfs.h defs.mk files.mk - make depend uses the .mk files, a Makefile, 14 .h files, 11 .c files, plus autoconf.[ch], to generate depend.mk. - the compile phase compiles each of the 11 .c files, plus autoconf.c, using the header files as well and all the make bits, to make 12 .o files. - the link phase creates a "kernel" image from the 12 .o files using the make bits, an extra shell script, and an extra generated .c file that might or might not be worth modeling independently. I have cut out all the machine-dependent goop so it should be compilable on any reasonable platform, even Windows (although you'll need cygwin or equivalent to run the scripts) and ought to even compile with almost any compiler, I think, although it still needs some tweaking. It also has a variant form (a different kernel config) and I have some queries in mind although I haven't written them up yet. Nor have I written up the workflow itself in any more detail than the above. So. Is this workload too large? It is a bit more than twice the size of the original challenge workload. I can cut it down a bit, but not that much. If it is going to be too large I don't want to put any more effort into it... Opinions? (If anyone wants to look at it, I've stuck the files here: http://www.eecs.harvard.edu/~dholland/tmp/challenge3/. To try running it, do ./configure; ./config GENERIC; make depend; make.) -- - David A. Holland / dholland AT eecs.harvard.edu