From owner-provenance-challenge@ipaw.info Mon Nov 20 13:20:23 2006 received: from [127.0.0.1] (login.ecs.soton.ac.uk [IPv6:2001:630:d0:f102:230:48ff:fe23:58df]) (authenticated bits=0) by imap.ecs.soton.ac.uk (8.13.1/8.13.1) with ESMTP id kAKDJcaI013587; Mon, 20 Nov 2006 13:19:38 GMT message-id: <4561AB94.9030804@ecs.soton.ac.uk> date: Mon, 20 Nov 2006 13:20:20 +0000 from: Luc Moreau user-agent: Thunderbird 1.5.0.8 (Windows/20061025) mime-version: 1.0 to: provenance-challenge@ipaw.info subject: [provenance-challenge] editorial: third draft content-type: text/plain; charset=ISO-8859-1; format=flowed content-transfer-encoding: 7bit x-null-tag: 926fcabda103046d83e2d0731d5abbac x-ecs-mailscanner: Found to be clean, Found to be clean, Found to be clean x-ecs-mailscanner-information: Please contact the ISP for more information x-ecs-mailscanner-from: l.moreau@ecs.soton.ac.uk x-greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2 (crow.ecs.soton.ac.uk [IPv6:2001:630:d0:f110:230:48ff:fe29:c3cd]); Mon, 20 Nov 2006 13:19:52 +0000 (GMT) x-mailscanner-information: Please contact helpdesk@ecs.soton.ac.uk for more information x-mailscanner-from: l.moreau@ecs.soton.ac.uk x-spam-status: No sender: owner-provenance-challenge@ipaw.info precedence: bulk reply-to: provenance-challenge@ipaw.info All, (Please read!) The third draft of the challenge editorial is available from: http://twiki.ipaw.info/pub/Challenge/SpecialIssueIntroduction/challenge-editorial.pdf It's a near impossible to job to reconcile all the comments I have received. I have tried to do my best to include as many comments as I could. Following Bertram's suggestions three matrix rows have been redefined, for which I solicit your input (*deadline is Thursday 23*). - 1.6: we now make the distinction between run/partial/simulated - 2.3 explicitly indicates whether annotations are in scope and whether they are supported for the queries - 2.4 explicitly indicates whether time is supported for the queries, and whether it is required for representing provenance. Also, some entries are empty for 2.5, 2.6 and 2.7, could you tell me how to fill the blanks. Cheers, Luc -- Professor Luc Moreau Electronics and Computer Science tel: +44 23 8059 4487 University of Southampton fax: +44 23 8059 2865 Southampton SO17 1BJ email: l.moreau@ecs.soton.ac.uk United Kingdom http://www.ecs.soton.ac.uk/~lavm From owner-provenance-challenge@ipaw.info Thu Nov 23 12:20:03 2006 received: from [127.0.0.1] (login.ecs.soton.ac.uk [IPv6:2001:630:d0:f102:230:48ff:fe23:58df]) (authenticated bits=0) by imap.ecs.soton.ac.uk (8.13.1/8.13.1) with ESMTP id kANCJO44022809; Thu, 23 Nov 2006 12:19:24 GMT message-id: <456591FB.5090707@ecs.soton.ac.uk> date: Thu, 23 Nov 2006 12:20:11 +0000 from: Luc Moreau user-agent: Thunderbird 1.5.0.8 (Windows/20061025) mime-version: 1.0 to: Luc Moreau , provenance-challenge@ipaw.info subject: [provenance-challenge] Re: editorial: third draft references: <4561AB94.9030804@ecs.soton.ac.uk> in-reply-to: <4561AB94.9030804@ecs.soton.ac.uk> content-type: text/plain; charset=ISO-8859-1; format=flowed content-transfer-encoding: 7bit x-null-tag: 111cc50a3a60da107e0acb83bc034c5a x-ecs-mailscanner: Found to be clean, Found to be clean, Found to be clean x-ecs-mailscanner-information: Please contact the ISP for more information x-ecs-mailscanner-from: l.moreau@ecs.soton.ac.uk x-greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2 (crow.ecs.soton.ac.uk [IPv6:2001:630:d0:f110:230:48ff:fe29:c3cd]); Thu, 23 Nov 2006 12:19:39 +0000 (GMT) x-mailscanner-information: Please contact helpdesk@ecs.soton.ac.uk for more information x-mailscanner-from: l.moreau@ecs.soton.ac.uk x-spam-status: No sender: owner-provenance-challenge@ipaw.info precedence: bulk reply-to: provenance-challenge@ipaw.info All, Just a kind reminder: I'd be grateful to receive your inputs for the new matrix entries today. Thanks, Luc Professor Luc Moreau Electronics and Computer Science tel: +44 23 8059 4487 University of Southampton fax: +44 23 8059 2865 Southampton SO17 1BJ email: l.moreau@ecs.soton.ac.uk United Kingdom http://www.ecs.soton.ac.uk/~lavm Luc Moreau wrote: > All, (Please read!) > > The third draft of the challenge editorial is available from: > http://twiki.ipaw.info/pub/Challenge/SpecialIssueIntroduction/challenge-editorial.pdf > > > It's a near impossible to job to reconcile all the comments I have > received. I have tried > to do my best to include as many comments as I could. > > Following Bertram's suggestions three matrix rows have been redefined, > for which I solicit > your input (*deadline is Thursday 23*). > - 1.6: we now make the distinction between run/partial/simulated > - 2.3 explicitly indicates whether annotations are in scope and > whether they are supported for the queries > - 2.4 explicitly indicates whether time is supported for the queries, > and whether it is required for representing provenance. > > Also, some entries are empty for 2.5, 2.6 and 2.7, could you tell me > how to fill the blanks. > > Cheers, > Luc > From owner-provenance-challenge@ipaw.info Thu Nov 23 20:15:18 2006 received: from NA-EXMSG-C105.redmond.corp.microsoft.com ([157.54.52.48]) by tk1-exhub-c103.redmond.corp.microsoft.com ([157.56.116.114]) with mapi; Thu, 23 Nov 2006 12:10:41 -0800 from: Roger Barga to: Luciano Digiampietri , "provenance-challenge@ipaw.info" date: Thu, 23 Nov 2006 12:10:24 -0800 subject: RE: [provenance-challenge] editorial: third draft thread-topic: [provenance-challenge] editorial: third draft thread-index: AccPNNSAwkVf9/YnSl24nEZiCRvn3QABpSHF message-id: <9EC860F4FAD34D4091536873D150B9951A18521DFB@NA-EXMSG-C105.redmond.corp.microsoft.com> references: <4561AB94.9030804@ecs.soton.ac.uk>, in-reply-to: accept-language: en-US content-language: en-US x-ms-has-attach: x-ms-tnef-correlator: acceptlanguage: en-US content-type: text/plain; charset="iso-8859-1" mime-version: 1.0 x-greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-2.0.2 (moorhen.ecs.soton.ac.uk [152.78.68.178]); Thu, 23 Nov 2006 20:11:33 +0000 (GMT) x-null-tag: e749796c6756c64ca5c3e364fdde4607 x-mailscanner-information: Please contact helpdesk@ecs.soton.ac.uk for more information x-ecs-mailscanner: Found to be clean x-ecs-spamscore: s x-mailscanner-from: barga@microsoft.com x-spam-status: No content-transfer-encoding: 8bit x-mime-autoconverted: from quoted-printable to 8bit by seer.ecs.soton.ac.uk id kANKFDsg023047 sender: owner-provenance-challenge@ipaw.info precedence: bulk reply-to: provenance-challenge@ipaw.info Luciano, thanks for taking the initiative on this! roger ________________________________ From: Luciano Digiampietri [luciano.digiampietri@gmail.com] Sent: Thursday, November 23, 2006 11:23 AM To: provenance-challenge@ipaw.info; Roger Barga Subject: Re: [provenance-challenge] editorial: third draft Hi Luc, I am working with Roger Barga in the Redux framework. We want to update two empty fields: 2.3 Arbitrary annotations in scope/implemented => yes 2.7 Abstraction mechanism => layered provenance model Thanks, -- Luciano Antonio Digiampietri Doutorando em Ciencia da Computacao, UNICAMP http://www.ic.unicamp.br/~luciano On 11/20/06, Luc Moreau > wrote: All, (Please read!) The third draft of the challenge editorial is available from: http://twiki.ipaw.info/pub/Challenge/SpecialIssueIntroduction/challenge-editorial.pdf It's a near impossible to job to reconcile all the comments I have received. I have tried to do my best to include as many comments as I could. Following Bertram's suggestions three matrix rows have been redefined, for which I solicit your input (*deadline is Thursday 23*). - 1.6: we now make the distinction between run/partial/simulated - 2.3 explicitly indicates whether annotations are in scope and whether they are supported for the queries - 2.4 explicitly indicates whether time is supported for the queries, and whether it is required for representing provenance. Also, some entries are empty for 2.5, 2.6 and 2.7, could you tell me how to fill the blanks. Cheers, Luc -- Professor Luc Moreau Electronics and Computer Science tel: +44 23 8059 4487 University of Southampton fax: +44 23 8059 2865 Southampton SO17 1BJ email: l.moreau@ecs.soton.ac.uk United Kingdom http://www.ecs.soton.ac.uk/~lavm From owner-provenance-challenge@ipaw.info Mon Nov 27 09:16:29 2006 received: from [127.0.0.1] (login.ecs.soton.ac.uk [IPv6:2001:630:d0:f102:230:48ff:fe23:58df]) (authenticated bits=0) by imap.ecs.soton.ac.uk (8.13.1/8.13.1) with ESMTP id kAR9FnGg003352; Mon, 27 Nov 2006 09:15:49 GMT message-id: <456AACFC.4000001@ecs.soton.ac.uk> date: Mon, 27 Nov 2006 09:16:44 +0000 from: Luc Moreau user-agent: Thunderbird 1.5.0.8 (Windows/20061025) mime-version: 1.0 to: Luciano Digiampietri cc: provenance-challenge@ipaw.info, Roger Barga subject: Re: [provenance-challenge] editorial: third draft references: <4561AB94.9030804@ecs.soton.ac.uk> in-reply-to: content-type: text/plain; charset=ISO-8859-1; format=flowed content-transfer-encoding: 7bit x-null-tag: 17880566acec8de50d8568d6c1703444 x-ecs-mailscanner: Found to be clean, Found to be clean, Found to be clean x-ecs-mailscanner-information: Please contact the ISP for more information x-ecs-mailscanner-from: l.moreau@ecs.soton.ac.uk x-greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2 (crow.ecs.soton.ac.uk [IPv6:2001:630:d0:f110:230:48ff:fe29:c3cd]); Mon, 27 Nov 2006 09:16:05 +0000 (GMT) x-mailscanner-information: Please contact helpdesk@ecs.soton.ac.uk for more information x-mailscanner-from: l.moreau@ecs.soton.ac.uk x-spam-status: No sender: owner-provenance-challenge@ipaw.info precedence: bulk reply-to: provenance-challenge@ipaw.info Thanks Luciano. What about time? It is supported clearly, but is it required in your provenance representation? Luc Professor Luc Moreau Electronics and Computer Science tel: +44 23 8059 4487 University of Southampton fax: +44 23 8059 2865 Southampton SO17 1BJ email: l.moreau@ecs.soton.ac.uk United Kingdom http://www.ecs.soton.ac.uk/~lavm Luciano Digiampietri wrote: > Hi Luc, > > I am working with Roger Barga in the Redux framework. > > We want to update two empty fields: > 2.3 Arbitrary annotations in scope/implemented => yes > 2.7 Abstraction mechanism => layered provenance model > > Thanks, > > -- > Luciano Antonio Digiampietri > Doutorando em Ciencia da Computacao, UNICAMP > http://www.ic.unicamp.br/~luciano > > > > On 11/20/06, *Luc Moreau* > wrote: > > All, (Please read!) > > The third draft of the challenge editorial is available from: > http://twiki.ipaw.info/pub/Challenge/SpecialIssueIntroduction/challenge-editorial.pdf > > > It's a near impossible to job to reconcile all the comments I have > received. I have tried > to do my best to include as many comments as I could. > > Following Bertram's suggestions three matrix rows have been redefined, > for which I solicit > your input (*deadline is Thursday 23*). > - 1.6: we now make the distinction between run/partial/simulated > - 2.3 explicitly indicates whether annotations are in scope and > whether they are supported for the queries > - 2.4 explicitly indicates whether time is supported for the queries, > and whether it is required for representing provenance. > > Also, some entries are empty for 2.5, 2.6 and 2.7, could you tell me > how to fill the blanks. > > Cheers, > Luc > > -- > Professor Luc Moreau > Electronics and Computer Science tel: +44 23 8059 4487 > University of Southampton fax: +44 23 8059 2865 > Southampton SO17 1BJ email: l.moreau@ecs.soton.ac.uk > > United Kingdom > http://www.ecs.soton.ac.uk/~lavm > > From owner-provenance-challenge@ipaw.info Tue Nov 28 16:59:55 2006 received: by 10.78.152.9 with HTTP; Tue, 28 Nov 2006 08:59:20 -0800 (PST) domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=googlemail.com; h=received:message-id:date:from:sender:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition:x-google-sender-auth; b=QWE2OoESQJhvU4X8bm2wOFNcLbKv4o6eK2Y7h517ydZpWScbBW0EKP7o4wCW4oL++rr6LL/2hY1xexpgokBPdUEuWRXUqyINcBzXjh4m6IlEm4PmrcLLKQCy3N/5GbiIgsJ3lwWJq70qDwKKqYceeMp+VMy49x6Z+c7WhS0F3Ao= message-id: date: Tue, 28 Nov 2006 16:59:20 +0000 from: "Simon Miles" to: provenance-challenge@ipaw.info subject: [provenance-challenge] Second provenance challenge mime-version: 1.0 content-type: text/plain; charset=ISO-8859-1; format=flowed content-transfer-encoding: 7bit content-disposition: inline x-google-sender-auth: b3a7ade7e0ea76b4 x-greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-2.0.2 (crow.ecs.soton.ac.uk [152.78.71.14]); Tue, 28 Nov 2006 16:59:22 +0000 (GMT) x-null-tag: a1aa8923eb0380536955a241ad493698 x-mailscanner-information: Please contact helpdesk@ecs.soton.ac.uk for more information x-ecs-mailscanner: Found to be clean x-ecs-spamscore: s x-mailscanner-from: drsimonmiles@googlemail.com x-spam-status: No sender: owner-provenance-challenge@ipaw.info precedence: bulk reply-to: provenance-challenge@ipaw.info Hello, We have drafted a proposal for a second provenance challenge, derived from that discussed at the workshop in Washington in September. http://twiki.ipaw.info/bin/view/Challenge/SecondProvenanceChallenge We welcome any comments or suggestions - does it seem reasonable and what you were expecting? Can I ask that all comments are given by 6th December so that, if acceptable, the challenge can officially start on 8th December. Thanks, Simon, Juliana, Luc From owner-provenance-challenge@ipaw.info Wed Nov 29 11:24:02 2006 received: by 10.78.152.9 with HTTP; Wed, 29 Nov 2006 03:20:06 -0800 (PST) domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=googlemail.com; h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=aLPDsdwtFrnpBp7dXZSuR42RDrliOqP/iEl7Bt+Y4zEjWf6Pecx6eAdOtZ9xfN7LlgZreXCXxZJjQtmG/QLX1knX7H0RFa/5oPPKEztYQ/dNYMIUu8feDuMyHzem6S8w2Xuv/mBNsZMRA1kpP79iMGibukUvtY0ROnrIhncpPU0= message-id: date: Wed, 29 Nov 2006 11:20:06 +0000 from: "Simon Miles" to: "Jun Zhao" subject: Re: [provenance-challenge] Second provenance challenge cc: provenance-challenge@ipaw.info in-reply-to: mime-version: 1.0 content-type: text/plain; charset=ISO-8859-1; format=flowed content-transfer-encoding: 7bit content-disposition: inline references: x-google-sender-auth: 927f177ca937cbae x-greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-2.0.2 (moorhen.ecs.soton.ac.uk [152.78.68.178]); Wed, 29 Nov 2006 11:20:52 +0000 (GMT) x-null-tag: fd2baa274195fd5a6bcd0e5c2ff107d2 x-mailscanner-information: Please contact helpdesk@ecs.soton.ac.uk for more information x-ecs-mailscanner: Found to be clean x-ecs-spamscore: ss x-mailscanner-from: drsimonmiles@googlemail.com x-spam-status: No sender: owner-provenance-challenge@ipaw.info precedence: bulk reply-to: provenance-challenge@ipaw.info Hello Jun, Thanks for the feedback. Jun Zhao wrote: > As I saw from the conclusion of the first challenge, it seems difficult to > compare the query results returned from different groups. One of the > problems that occurred to me when answering the first challenge were to > understand the scope of querying information space, i.e. should I retrieve > information from one run of the workflow or many runs. I am not sure > whether it matters that much to the other projects. I suppose it is implicit in the description and the variation in Question 7 that the answers are for one run of the workflow only. This could be made explicit in the challenge description. The method by which you would distinguish the workflow run of interest from others is certainly interesting. For Southampton, it is part of the query mechanism and so directly relevant for answering the queries in the challenges, but I agree it might not be as relevant for other teams for this challenge. I suggest we leave it to be documented by the teams if they think it relevant to their challenge results. > The second thing (as I read the challenge quickly, I might have missed > it:)), are there any requirements as to which projects we should choose to > pair up and how many we should choose? No, we have placed no requirements on whose data to try and translate / query over - as many other teams as possible! > Maybe we can also share the parsers if that would help? I agree that this is important. We have requested on the page that "...a reference [be] given to a free parser for that format" and that "we strongly encourage (but do not require) teams to export their data in XML" Hopefully this is enough to make the parsing of each others' data as straightforward as possible. Thanks, Simon > cheers, > > Jun > > On Nov 28 2006, Simon Miles wrote: > > > Hello, > > > > We have drafted a proposal for a second provenance challenge, derived > > from that discussed at the workshop in Washington in September. > > > > http://twiki.ipaw.info/bin/view/Challenge/SecondProvenanceChallenge > > > > We welcome any comments or suggestions - does it seem reasonable and > > what you were expecting? Can I ask that all comments are given by 6th > > December so that, if acceptable, the challenge can officially start on > > 8th December. > > > > Thanks, > > Simon, Juliana, Luc > > > From owner-provenance-challenge@ipaw.info Tue Dec 5 01:43:47 2006 received: by mail.eecs.harvard.edu (Postfix, from userid 32170) id EC5221A3F27; Mon, 4 Dec 2006 20:42:50 -0500 (EST) subject: Re: [provenance-challenge] Second provenance challenge to: provenance-challenge@ipaw.info date: Mon, 4 Dec 2006 20:42:50 -0500 (EST) in-reply-to: x-mailer: ELM [version 2.5 PL8] mime-version: 1.0 content-type: text/plain; charset=us-ascii content-transfer-encoding: 7bit message-id: <20061205014250.EC5221A3F27@mail.eecs.harvard.edu> from: dholland@eecs.harvard.edu (David Holland) x-greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-2.0.2 (jackdaw.ecs.soton.ac.uk [152.78.68.137]); Tue, 05 Dec 2006 01:43:07 +0000 (GMT) x-null-tag: 0f7ccb726d53f9c89c37af1ed2b713a2 x-mailscanner-information: Please contact helpdesk@ecs.soton.ac.uk for more information x-ecs-mailscanner: Found to be clean x-mailscanner-from: dholland@eecs.harvard.edu x-spam-status: No sender: owner-provenance-challenge@ipaw.info precedence: bulk reply-to: provenance-challenge@ipaw.info > http://twiki.ipaw.info/bin/view/Challenge/SecondProvenanceChallenge So it says : [T]he queries and their expected results were weakly specified, and : so interpreted differently by different groups. but there's no additional clarification of the queries. I think at least some of this should be done beforehand; we've all run the queries and probably in the process noticed things that were underspecified, and it would make the downstream comparison of results easier if all questions that have already arisen can be resolved in advance. Some points that come to mind: - In Q2, what exactly is "the averaging of images with softmean"? From a dataflow perspective, is the cutoff point supposed to be the softmean process executions themselves, or the files used as input to softmean? Or, similarly from an events perspective, is the cutoff point where softmean has read the inputs and is doing the computation before generating the outputs, or the point at which softmean first begins execution, or what? - In Q4, does "all invocations" mean "all invocations related to this workflow" or "all invocations that might have ever happened"? Same for Q6. - In Q7, is the variant workload supposed to start from the *same* input files, or new copies of the same input data? Should the variant workload clobber the intermediate and output files from the original, or should it be run such that both can exist simultaneously (e.g., in a different directory)? - For Q8 and Q9, we should all agree on a set of annotations to perform on the various available files, and also when they should be added relative to the workflow execution, so we all get vaguely comparable results searching for them. -- - David A. Holland / dholland@eecs.harvard.edu From owner-provenance-challenge@ipaw.info Tue Dec 5 17:30:53 2006 received: by 10.78.150.13 with HTTP; Tue, 5 Dec 2006 09:30:19 -0800 (PST) domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=googlemail.com; h=received:message-id:date:from:sender:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=YDd9ai11MAhlwryn7ZFtP+1eN9vIlcVnmr2bt1vHcsXr5bNMOzoy6zMOHUrfxXNzJfPLMbkkJCKRS/GAOGoaiDU17sDMsyFWnnwMC1as2XzqCN6RdKZu9w2Ixuqa6kQelR8iFAah6TrDDc3RmzKXVvfCB0fv3m1PBGkEhk3mILE= message-id: date: Tue, 5 Dec 2006 17:30:19 +0000 from: "Simon Miles" to: provenance-challenge@ipaw.info subject: Re: [provenance-challenge] Second provenance challenge in-reply-to: <20061205014250.EC5221A3F27@mail.eecs.harvard.edu> mime-version: 1.0 content-type: text/plain; charset=ISO-8859-1; format=flowed content-transfer-encoding: 7bit content-disposition: inline references: <20061205014250.EC5221A3F27@mail.eecs.harvard.edu> x-google-sender-auth: 702575721bd9d34a x-greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2 (crow.ecs.soton.ac.uk [152.78.71.14]); Tue, 05 Dec 2006 17:30:21 +0000 (GMT) x-null-tag: 01abfcd57a16dd2a7ec3ac943f92d5e8 x-mailscanner-information: Please contact helpdesk@ecs.soton.ac.uk for more information x-ecs-mailscanner: Found to be clean x-mailscanner-from: drsimonmiles@googlemail.com x-spam-status: No sender: owner-provenance-challenge@ipaw.info precedence: bulk reply-to: provenance-challenge@ipaw.info Hello David, Thanks, these are good points. I have tried to fix the ambiguities you indicate (details below). http://twiki.ipaw.info/bin/view/Challenge/SecondProvenanceChallenge First, as well as clarifications to the queries, it is apparent from your and Jun's mails that we need to be more explicit about what actually occurs before the provenance data is exported in the challenge. In particular, some queries only make sense if the workflow has been run more than once and we need to be able to identify annotations within the exported data. I've added the following text to make this explicit. "Specifically, the exported data should contain: * Documentation of the three parts of one run of the workflow as shown in the Workflow Parts section below. * Documentation of the three parts of one run of the workflow in the adaptation specified by Provenance Query 7, i.e. replacing the single convert procedure with two procedures, pgmtoppm then pnmtojpeg, in workflow Part 3. * The following annotations: * Anatomy Image 1, as used in the first workflow run, is annotated with key-value pair center=UChicago. * Anatomy Image 2, as used in the first workflow run, is annotated with key-value pairs center=southampton and studyModality=speech. If the output of a team differs from that given above, including omissions of one or other piece of data, please make it clear in your data output." For the queries, I have added the clarifications below. > - In Q2, what exactly is "the averaging of images with softmean"? > From a dataflow perspective, is the cutoff point supposed to be > the softmean process executions themselves, or the files used as > input to softmean? Or, similarly from an events perspective, is > the cutoff point where softmean has read the inputs and is doing > the computation before generating the outputs, or the point at > which softmean first begins execution, or what? We will rephrase this as: 2. Find the process that led to Atlas X Graphic, excluding everything prior to softmean outputting the Atlas Image, i.e. the inputs, processing and outputs of align_warp and reslice, and the inputs and processing of softmean will be excluded. > - In Q4, does "all invocations" mean "all invocations related to > this workflow" or "all invocations that might have ever > happened"? Same for Q6. We will rephrase these as: 4. Find all invocations of procedure align_warp that have ever occurred in the system using a twelfth order nonlinear 1365 parameter model (see model menu describing possible values of parameter "-m 12" of align_warp) that ran on a Monday. 6. Find all images ever output from softmean where the warped images taken as input were align_warped using a twelfth order nonlinear 1365 parameter model, i.e. "where softmean was preceded in the workflow, directly or indirectly, by an align_warp procedure with argument -m 12." > - In Q7, is the variant workload supposed to start from the *same* > input files, or new copies of the same input data? Should the > variant workload clobber the intermediate and output files from > the original, or should it be run such that both can exist > simultaneously (e.g., in a different directory)? The query is rephrased below to answer your first point. The second point above seems to assume too much about the operation of the system: overwriting of data only makes sense in some systems (some systems may pass data by value and never store it in a file or by other means). I feel that stating it would be too restrictive - if it is important to understanding your provenance data, please may it clear with the exported data. 7. A user has run the workflow twice on the same input files, in the second instance replacing each convert procedure in the final stage with two procedures: pgmtoppm, then pnmtojpeg. Find the differences between the two workflow runs. The exact level of detail in the difference that is detected by a system is up to each participant. > - For Q8 and Q9, we should all agree on a set of annotations to > perform on the various available files, and also when they should > be added relative to the workflow execution, so we all get > vaguely comparable results searching for them. OK, hopefully this point is addressed by the specification of annotations at the start of the email. Thanks, Simon From owner-provenance-challenge@ipaw.info Wed Dec 6 00:12:05 2006 received: from smtp03.bis.eu.blackberry.com (smtp03.bis.eu.blackberry.com [216.9.253.50]) by crow.ecs.soton.ac.uk (8.13.1/8.13.1) with ESMTP id kB60BQ62012400 for ; Wed, 6 Dec 2006 00:11:35 GMT message-id: <1653624061-1165363888-cardhu_blackberry.rim.net-53579765-@bxe049-cell00.bisx.produk.on.blackberry> references: <20061205014250.EC5221A3F27@mail.eecs.harvard.edu> in-reply-to: sensitivity: Normal importance: Normal to: provenance-challenge@ipaw.info subject: {SPAM?} Re: [provenance-challenge] Second provenance challenge from: jmgomez@isoco.com date: Wed, 6 Dec 2006 00:11:18 +0000 content-type: text/plain mime-version: 1.0 x-greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-2.0.2 (crow.ecs.soton.ac.uk [152.78.71.14]); Wed, 06 Dec 2006 00:11:36 +0000 (GMT) x-null-tag: 84d6972eb5f96e2d1974cde44a6c5be4 x-mailscanner-information: Please contact helpdesk@ecs.soton.ac.uk for more information x-ecs-mailscanner: Found to be clean x-ecs-spamcheck: spam, SpamAssassin (score=6.08, required 6, INFO_TLD 1.27, LW_STOCK_SPAM4 1.66, MIME_BASE64_NO_NAME 0.22, MIME_BASE64_TEXT 1.89, NO_REAL_NAME 0.96, TW_PG 0.08) x-ecs-spamscore: ssssss x-mailscanner-from: jmgomez@isoco.com x-spam-status: Yes content-transfer-encoding: 8bit x-mime-autoconverted: from base64 to 8bit by seer.ecs.soton.ac.uk id kB60Bxsg004652 sender: owner-provenance-challenge@ipaw.info precedence: bulk reply-to: provenance-challenge@ipaw.info Hi Simon, As I said, our provenance system is still far from our goals but we still want to participate. Will provide you with the required materials (treces, etc) asap. Thanks, Jose PS: thanks for the logs! --- -----Original Message----- From: "Simon Miles" Date: Tue, 5 Dec 2006 17:30:19 To:provenance-challenge@ipaw.info Subject: Re: [provenance-challenge] Second provenance challenge Hello David, Thanks, these are good points. I have tried to fix the ambiguities you indicate (details below). http://twiki.ipaw.info/bin/view/Challenge/SecondProvenanceChallenge First, as well as clarifications to the queries, it is apparent from your and Jun's mails that we need to be more explicit about what actually occurs before the provenance data is exported in the challenge. In particular, some queries only make sense if the workflow has been run more than once and we need to be able to identify annotations within the exported data. I've added the following text to make this explicit. "Specifically, the exported data should contain: * Documentation of the three parts of one run of the workflow as shown in the Workflow Parts section below. * Documentation of the three parts of one run of the workflow in the adaptation specified by Provenance Query 7, i.e. replacing the single convert procedure with two procedures, pgmtoppm then pnmtojpeg, in workflow Part 3. * The following annotations: * Anatomy Image 1, as used in the first workflow run, is annotated with key-value pair center=UChicago. * Anatomy Image 2, as used in the first workflow run, is annotated with key-value pairs center=southampton and studyModality=speech. If the output of a team differs from that given above, including omissions of one or other piece of data, please make it clear in your data output." For the queries, I have added the clarifications below. > - In Q2, what exactly is "the averaging of images with softmean"? > From a dataflow perspective, is the cutoff point supposed to be > the softmean process executions themselves, or the files used as > input to softmean? Or, similarly from an events perspective, is > the cutoff point where softmean has read the inputs and is doing > the computation before generating the outputs, or the point at > which softmean first begins execution, or what? We will rephrase this as: 2. Find the process that led to Atlas X Graphic, excluding everything prior to softmean outputting the Atlas Image, i.e. the inputs, processing and outputs of align_warp and reslice, and the inputs and processing of softmean will be excluded. > - In Q4, does "all invocations" mean "all invocations related to > this workflow" or "all invocations that might have ever > happened"? Same for Q6. We will rephrase these as: 4. Find all invocations of procedure align_warp that have ever occurred in the system using a twelfth order nonlinear 1365 parameter model (see model menu describing possible values of parameter "-m 12" of align_warp) that ran on a Monday. 6. Find all images ever output from softmean where the warped images taken as input were align_warped using a twelfth order nonlinear 1365 parameter model, i.e. "where softmean was preceded in the workflow, directly or indirectly, by an align_warp procedure with argument -m 12." > - In Q7, is the variant workload supposed to start from the *same* > input files, or new copies of the same input data? Should the > variant workload clobber the intermediate and output files from > the original, or should it be run such that both can exist > simultaneously (e.g., in a different directory)? The query is rephrased below to answer your first point. The second point above seems to assume too much about the operation of the system: overwriting of data only makes sense in some systems (some systems may pass data by value and never store it in a file or by other means). I feel that stating it would be too restrictive - if it is important to understanding your provenance data, please may it clear with the exported data. 7. A user has run the workflow twice on the same input files, in the second instance replacing each convert procedure in the final stage with two procedures: pgmtoppm, then pnmtojpeg. Find the differences between the two workflow runs. The exact level of detail in the difference that is detected by a system is up to each participant. > - For Q8 and Q9, we should all agree on a set of annotations to > perform on the various available files, and also when they should > be added relative to the workflow execution, so we all get > vaguely comparable results searching for them. OK, hopefully this point is addressed by the specification of annotations at the start of the email. Thanks, Simon From owner-provenance-challenge@ipaw.info Tue Dec 12 10:53:24 2006 received: by 10.78.152.9 with HTTP; Tue, 12 Dec 2006 02:51:58 -0800 (PST) domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=googlemail.com; h=received:message-id:date:from:sender:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition:x-google-sender-auth; b=QsHxckLB8whiPppkrtei2nJkn4s5F3kSlXwP5gBhq/A3/cwvUbGZTNQ6azYBFct7Holm4qEB3vCmA5NKAthhs34PmKD26cbHWIMgoaybKKIaZB4TqPSco4UDwmEGlkDu4MuOUVg7g08ZICEr0g4DWt04jMAYi85Ccdodu6Ma7tw= message-id: date: Tue, 12 Dec 2006 10:51:58 +0000 from: "Simon Miles" to: provenance-challenge@ipaw.info subject: [provenance-challenge] Second challenge starts! mime-version: 1.0 content-type: text/plain; charset=ISO-8859-1; format=flowed content-transfer-encoding: 7bit content-disposition: inline x-google-sender-auth: 8d339560c8eaa717 x-greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-2.0.2 (jackdaw.ecs.soton.ac.uk [152.78.68.137]); Tue, 12 Dec 2006 10:52:10 +0000 (GMT) x-null-tag: dd356454ef86c8b3fb3e23722c87f9a9 x-mailscanner-information: Please contact helpdesk@ecs.soton.ac.uk for more information x-ecs-mailscanner: Found to be clean x-ecs-spamscore: s x-mailscanner-from: drsimonmiles@googlemail.com x-spam-status: No sender: owner-provenance-challenge@ipaw.info precedence: bulk reply-to: provenance-challenge@ipaw.info Dear challenge participants, Thanks to all those that commented on the second challenge specification. We have tried to fix the text where problems or ambiguities were brought to our attention. Please now consider the challenge officially started. Any changes from now on will be minor clarifications and should not effect the content of the exercise. Introductions and instructions have been added to the main page: http://twiki.ipaw.info Please advertise it to anyone you think may be interested (as well as taking part yourselves). We look forward to seeing your uploaded provenance data at the end of of January! Thanks, Simon, Juliana, Luc From owner-provenance-challenge@ipaw.info Tue Dec 12 13:56:33 2006 received: by 10.78.152.9 with HTTP; Tue, 12 Dec 2006 05:52:32 -0800 (PST) domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=googlemail.com; h=received:message-id:date:from:sender:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition:x-google-sender-auth; b=YiH8P7jBFhdo4WVFxhoF6bDv6vY2D0WX10UaBmb5fmIMFaIfp+qHYCjZiRFgHRlhPzYvJxWai8JqtT+2X9DerqIvxeQXGGR6XraSfEazihW/3xeEtIayoa1nLd0DMc7aNrDv8f3+esSMZElFwbAvLPtSfoQMj6nM4IGnG73+T04= message-id: date: Tue, 12 Dec 2006 13:52:32 +0000 from: "Simon Miles" to: provenance-challenge@ipaw.info subject: [provenance-challenge] New provenance queries from Washington workshop mime-version: 1.0 content-type: text/plain; charset=ISO-8859-1; format=flowed content-transfer-encoding: 7bit content-disposition: inline x-google-sender-auth: e37e9a58903f3f7e x-greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-2.0.2 (moorhen.ecs.soton.ac.uk [152.78.68.178]); Tue, 12 Dec 2006 13:53:22 +0000 (GMT) x-null-tag: fcd6449450c94b72d4db213ec8933b5a x-mailscanner-information: Please contact helpdesk@ecs.soton.ac.uk for more information x-ecs-mailscanner: Found to be clean x-mailscanner-from: drsimonmiles@googlemail.com x-spam-status: No sender: owner-provenance-challenge@ipaw.info precedence: bulk reply-to: provenance-challenge@ipaw.info Hello again, During the Washington provenance workshop, it was suggested that we develop new provenance queries to examine the edge cases and issues not touched by the first challenge. We discussed many such queries, related to long-term use of provenance, accidental corruption of data etc. These have now been written up and uploaded to the TWiki. They aren't intended to be anything to do with the second challenge, but are hopefully a useful resource in themselves. http://twiki.ipaw.info/bin/view/Challenge/ProvenanceQueries If you remember any others I've forgotten, please feel free to add them! Thanks, Simon P.S. If you want even more provenance-related questions: we captured many use cases in interviews with biologists, chemists, physicists, computer scientists and social scientists at the start of our project in 2004. Many of these are specified in a paper (http://eprints.ecs.soton.ac.uk/13242, soon to be published in Journal of Grid Computing) and even more are available on the website www.pasoa.org From owner-provenance-challenge@ipaw.info Tue Dec 12 14:20:51 2006 received: from LUDAESCH-LAPTOP (imapssl.cs.ucdavis.edu [169.237.6.5]) by imapssl.cs.ucdavis.edu (8.13.8/8.13.8) with ESMTP id kBCEJwxR009320; Tue, 12 Dec 2006 06:19:58 -0800 (PST) x-asg-debug-id: 1165933199-3a3400040000-bCxFnu x-barracuda-url: http://169.237.4.8:8000/cgi-bin/mark.cgi x-barracuda-connect: imapssl.cs.ucdavis.edu[169.237.6.5] x-barracuda-start-time: 1165933199 x-mailer: 21.4 (patch 13) "Rational FORTRAN" XEmacs Lucid (via feedmail 10 I); VM 7.17 under 21.4 (patch 13) "Rational FORTRAN" XEmacs Lucid from: "Bertram Ludaescher" mime-version: 1.0 content-type: text/plain; charset=us-ascii content-transfer-encoding: 7bit message-id: <17790.47770.8000.360615@gargle.gargle.HOWL> date: Tue, 12 Dec 2006 06:20:10 -0800 to: provenance-challenge@ipaw.info x-asg-orig-subj: Re: [provenance-challenge] New provenance queries from Washington workshop subject: Re: [provenance-challenge] New provenance queries from Washington workshop in-reply-to: references: x-barracuda-virus-scanned: by Barracuda Spam Firewall at cs.ucdavis.edu x-barracuda-spam-score: 0.81 x-barracuda-spam-status: No, SCORE=0.81 using global scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=5.0 tests=INFO_TLD x-barracuda-spam-report: Code version 3.02, rules version 3.0.28493 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.81 INFO_TLD URI: Contains an URL in the INFO top-level domain x-greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-2.0.2 (coot.ecs.soton.ac.uk [152.78.71.84]); Tue, 12 Dec 2006 14:20:07 +0000 (GMT) x-null-tag: 3960ff278912f2796440a6329c425290 x-mailscanner-information: Please contact helpdesk@ecs.soton.ac.uk for more information x-ecs-mailscanner: Found to be clean x-mailscanner-from: ludaesch@ucdavis.edu x-spam-status: No sender: owner-provenance-challenge@ipaw.info precedence: bulk reply-to: provenance-challenge@ipaw.info Hi Simon and all: Re. the 2nd Prov. Challenge: I understand this is a "multi-phase" one, with the first phase ending in January. I'm a bit concerned that many teams might still recover from the 1st Challenge (talking here, at least in part for RWS and DAKS/COMAD). I'd like to suggest to have a "roll call" and see who actually plans to participate in the second challenge: (a) with the current schedule (b) with a schedule starting a bit later. For example, I would have liked to give some feedback on the 2nd challenge but I guess I'm a bit late (may still do so) Also, I think the teams for the 2nd challenge might benefit from digesting fully what was learned from the 1st one. For example, it might be good to read each other's papers (pre-prints) as the become available in Jan/Feb.. thoughts? Overall, I think it's a great idea to follow up on the 1st challenge and revisit the same workflow for the 2nd challenge (taking into account what was learned). Provenance interop sounds like a good topic. Maybe in addition to interop issue for the 1st challenge workflow, there could be a 2nd workflow that has advanced/alternative processing requirements .. Or maybe "meta-teams" or new teams could suggest specific "workflow patterns" and the provenance issues related to it. Patterns could include, e.g., data-dependent branching, pipelined execution, nested workflows, etc. Bertram >>> On Tue, 12 Dec 2006 13:52:32 +0000 >>> "Simon Miles" wrote: SM> SM> Hello again, SM> During the Washington provenance workshop, it was suggested that we SM> develop new provenance queries to examine the edge cases and issues SM> not touched by the first challenge. We discussed many such queries, SM> related to long-term use of provenance, accidental corruption of data SM> etc. These have now been written up and uploaded to the TWiki. They SM> aren't intended to be anything to do with the second challenge, but SM> are hopefully a useful resource in themselves. SM> SM> http://twiki.ipaw.info/bin/view/Challenge/ProvenanceQueries SM> SM> If you remember any others I've forgotten, please feel free to add them! SM> SM> Thanks, SM> Simon SM> SM> P.S. If you want even more provenance-related questions: we captured SM> many use cases in interviews with biologists, chemists, physicists, SM> computer scientists and social scientists at the start of our project SM> in 2004. Many of these are specified in a paper SM> (http://eprints.ecs.soton.ac.uk/13242, soon to be published in Journal SM> of Grid Computing) and even more are available on the website SM> www.pasoa.org From owner-provenance-challenge@ipaw.info Tue Dec 12 14:34:44 2006 received: from cspool38.cs.man.ac.uk ([130.88.195.138]:1324) by gerhayn.mcc.ac.uk with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.60 (FreeBSD)) (envelope-from ) id 1Gu8eH-0000aK-8V; Tue, 12 Dec 2006 14:30:25 +0000 message-id: <457EBCFF.6080508@manchester.ac.uk> date: Tue, 12 Dec 2006 14:30:23 +0000 from: Carole Goble user-agent: Thunderbird 1.5.0.8 (Windows/20061025) mime-version: 1.0 to: provenance-challenge@ipaw.info cc: June Finch , daniele turi , Jun Zhao , rds@cs.man.ac.uk, Duncan Hull , Antoon Goderis , Qiuwei Yu subject: Re: [provenance-challenge] New provenance queries from Washington workshop references: <17790.47770.8000.360615@gargle.gargle.HOWL> in-reply-to: <17790.47770.8000.360615@gargle.gargle.HOWL> content-type: text/plain; charset=ISO-8859-1; format=flowed content-transfer-encoding: 7bit x-authenticated-sender: Ian Cottam from cspool38.cs.man.ac.uk [130.88.195.138]:1324 x-authenticated-from: Ian.Cottam@manchester.ac.uk x-uom: Scanned by the University Mail System. See http://www.itservices.manchester.ac.uk/email/filtering/information/ for details. x-greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-2.0.2 (moorhen.ecs.soton.ac.uk [152.78.68.178]); Tue, 12 Dec 2006 14:31:10 +0000 (GMT) x-null-tag: 9757cc9b63e9a2eff2cc64162ba40df2 x-mailscanner-information: Please contact helpdesk@ecs.soton.ac.uk for more information x-ecs-mailscanner: Found to be clean x-ecs-spamscore: sssss x-mailscanner-from: carole.goble@manchester.ac.uk x-spam-status: No sender: owner-provenance-challenge@ipaw.info precedence: bulk reply-to: provenance-challenge@ipaw.info Bertram This is a coincidence -- we also discussed this in the myGrid planning meeting today as we are concerned that we do not have the resources to set aside for the challenge and produce the Taverna 1.5 release, esp. as Jun, Duncan and Antoon are writing up their PhDs right now and Sky isn't up to speed as he has just started. So we are seriously considering withdrawing from the second challenge. Carole > Hi Simon and all: > > Re. the 2nd Prov. Challenge: I understand this is a "multi-phase" one, > with the first phase ending in January. > > I'm a bit concerned that many teams might still recover from the 1st > Challenge (talking here, at least in part for RWS and DAKS/COMAD). > > I'd like to suggest to have a "roll call" and see who actually plans > to participate in the second challenge: > (a) with the current schedule > (b) with a schedule starting a bit later. > > For example, I would have liked to give some feedback on the 2nd > challenge but I guess I'm a bit late (may still do so) > > Also, I think the teams for the 2nd challenge might benefit from > digesting fully what was learned from the 1st one. For example, it > might be good to read each other's papers (pre-prints) as the become > available in Jan/Feb.. > > thoughts? > > Overall, I think it's a great idea to follow up on the 1st challenge > and revisit the same workflow for the 2nd challenge (taking into > account what was learned). Provenance interop sounds like a good > topic. Maybe in addition to interop issue for the 1st challenge > workflow, there could be a 2nd workflow that has advanced/alternative > processing requirements .. > > Or maybe "meta-teams" or new teams could suggest specific "workflow > patterns" and the provenance issues related to it. Patterns could > include, e.g., data-dependent branching, pipelined execution, > nested workflows, etc. > > Bertram > > >>>> On Tue, 12 Dec 2006 13:52:32 +0000 >>>> "Simon Miles" wrote: >>>> > SM> > SM> Hello again, > SM> During the Washington provenance workshop, it was suggested that we > SM> develop new provenance queries to examine the edge cases and issues > SM> not touched by the first challenge. We discussed many such queries, > SM> related to long-term use of provenance, accidental corruption of data > SM> etc. These have now been written up and uploaded to the TWiki. They > SM> aren't intended to be anything to do with the second challenge, but > SM> are hopefully a useful resource in themselves. > SM> > SM> http://twiki.ipaw.info/bin/view/Challenge/ProvenanceQueries > SM> > SM> If you remember any others I've forgotten, please feel free to add them! > SM> > SM> Thanks, > SM> Simon > SM> > SM> P.S. If you want even more provenance-related questions: we captured > SM> many use cases in interviews with biologists, chemists, physicists, > SM> computer scientists and social scientists at the start of our project > SM> in 2004. Many of these are specified in a paper > SM> (http://eprints.ecs.soton.ac.uk/13242, soon to be published in Journal > SM> of Grid Computing) and even more are available on the website > SM> www.pasoa.org > >