From: Luc Moreau <L.Moreau AT ecs.soton.ac.uk>
Date: Mon, 20 Nov 2006 13:20:20 +0000
| Threading: | • This Message → [provenance-challenge] Re: editorial: third draft from L.Moreau AT ecs.soton.ac.uk → Re: [provenance-challenge] editorial: third draft from L.Moreau AT ecs.soton.ac.uk |
All, (Please read!) The third draft of the challenge editorial is available from: http://twiki.ipaw.info/pub/Challenge/SpecialIssueIntroduction/challenge-editorial.pdf It's a near impossible to job to reconcile all the comments I have received. I have tried to do my best to include as many comments as I could. Following Bertram's suggestions three matrix rows have been redefined, for which I solicit your input (*deadline is Thursday 23*). - 1.6: we now make the distinction between run/partial/simulated - 2.3 explicitly indicates whether annotations are in scope and whether they are supported for the queries - 2.4 explicitly indicates whether time is supported for the queries, and whether it is required for representing provenance. Also, some entries are empty for 2.5, 2.6 and 2.7, could you tell me how to fill the blanks. Cheers, Luc -- Professor Luc Moreau Electronics and Computer Science tel: +44 23 8059 4487 University of Southampton fax: +44 23 8059 2865 Southampton SO17 1BJ email: l.moreau AT ecs.soton.ac.uk United Kingdom http://www.ecs.soton.ac.uk/~lavm
From: Luc Moreau <L.Moreau AT ecs.soton.ac.uk>
Date: Thu, 23 Nov 2006 12:20:11 +0000
| Threading: | ↑ [provenance-challenge] editorial: third draft from L.Moreau AT ecs.soton.ac.uk • This Message |
All, Just a kind reminder: I'd be grateful to receive your inputs for the new matrix entries today. Thanks, Luc Professor Luc Moreau Electronics and Computer Science tel: +44 23 8059 4487 University of Southampton fax: +44 23 8059 2865 Southampton SO17 1BJ email: l.moreau AT ecs.soton.ac.uk United Kingdom http://www.ecs.soton.ac.uk/~lavm Luc Moreau wrote: > All, (Please read!) > > The third draft of the challenge editorial is available from: > ↵ http://twiki.ipaw.info/pub/Challenge/SpecialIssueIntroduction/challenge-editorial.pdf > > > It's a near impossible to job to reconcile all the comments I have > received. I have tried > to do my best to include as many comments as I could. > > Following Bertram's suggestions three matrix rows have been redefined, > for which I solicit > your input (*deadline is Thursday 23*). > - 1.6: we now make the distinction between run/partial/simulated > - 2.3 explicitly indicates whether annotations are in scope and > whether they are supported for the queries > - 2.4 explicitly indicates whether time is supported for the queries, > and whether it is required for representing provenance. > > Also, some entries are empty for 2.5, 2.6 and 2.7, could you tell me > how to fill the blanks. > > Cheers, > Luc >
From: Roger Barga <barga AT microsoft.com>
Date: Thu, 23 Nov 2006 12:10:24 -0800
Luciano, thanks for taking the initiative on this! roger ________________________________ From: Luciano Digiampietri [luciano.digiampietri AT gmail.com] Sent: Thursday, November 23, 2006 11:23 AM To: provenance-challenge AT ipaw.info; Roger Barga Subject: Re: [provenance-challenge] editorial: third draft Hi Luc, I am working with Roger Barga in the Redux framework. We want to update two empty fields: 2.3 Arbitrary annotations in scope/implemented => yes 2.7 Abstraction mechanism => layered provenance model Thanks, -- Luciano Antonio Digiampietri Doutorando em Ciencia da Computacao, UNICAMP http://www.ic.unicamp.br/~luciano<http://www.ic.unicamp.br/%7Eluciano> On 11/20/06, Luc Moreau <L.Moreau AT ecs.soton.ac.uk<mailto:L.Moreau AT ↵ ecs.soton.ac.uk>> wrote: All, (Please read!) The third draft of the challenge editorial is available from: http://twiki.ipaw.info/pub/Challenge/SpecialIssueIntroduction/challenge-editorial.pdf It's a near impossible to job to reconcile all the comments I have received. I have tried to do my best to include as many comments as I could. Following Bertram's suggestions three matrix rows have been redefined, for which I solicit your input (*deadline is Thursday 23*). - 1.6: we now make the distinction between run/partial/simulated - 2.3 explicitly indicates whether annotations are in scope and whether they are supported for the queries - 2.4 explicitly indicates whether time is supported for the queries, and whether it is required for representing provenance. Also, some entries are empty for 2.5, 2.6 and 2.7, could you tell me how to fill the blanks. Cheers, Luc -- Professor Luc Moreau Electronics and Computer Science tel: +44 23 8059 4487 University of Southampton fax: +44 23 8059 2865 Southampton SO17 1BJ email: l.moreau AT ecs.soton.ac.uk ↵ <mailto:l.moreau AT ecs.soton.ac.uk> United Kingdom ↵ http://www.ecs.soton.ac.uk/~lavm<http://www.ecs.soton.ac.uk/%7Elavm>
From: Luc Moreau <L.Moreau AT ecs.soton.ac.uk>
Date: Mon, 27 Nov 2006 09:16:44 +0000
| Threading: | ↑ [provenance-challenge] editorial: third draft from L.Moreau AT ecs.soton.ac.uk • This Message |
Thanks Luciano. What about time? It is supported clearly, but is it required in your provenance representation? Luc Professor Luc Moreau Electronics and Computer Science tel: +44 23 8059 4487 University of Southampton fax: +44 23 8059 2865 Southampton SO17 1BJ email: l.moreau AT ecs.soton.ac.uk United Kingdom http://www.ecs.soton.ac.uk/~lavm Luciano Digiampietri wrote: > Hi Luc, > > I am working with Roger Barga in the Redux framework. > > We want to update two empty fields: > 2.3 Arbitrary annotations in scope/implemented => yes > 2.7 Abstraction mechanism => layered provenance model > > Thanks, > > -- > Luciano Antonio Digiampietri > Doutorando em Ciencia da Computacao, UNICAMP > http://www.ic.unicamp.br/~luciano ↵ <http://www.ic.unicamp.br/%7Eluciano> > > > > On 11/20/06, *Luc Moreau* <L.Moreau AT ecs.soton.ac.uk > <mailto:L.Moreau AT ecs.soton.ac.uk>> wrote: > > All, (Please read!) > > The third draft of the challenge editorial is available from: > ↵ http://twiki.ipaw.info/pub/Challenge/SpecialIssueIntroduction/challenge-editorial.pdf > ↵ <http://twiki.ipaw.info/pub/Challenge/SpecialIssueIntroduction/challenge-editorial.pdf> > > It's a near impossible to job to reconcile all the comments I have > received. I have tried > to do my best to include as many comments as I could. > > Following Bertram's suggestions three matrix rows have been redefined, > for which I solicit > your input (*deadline is Thursday 23*). > - 1.6: we now make the distinction between run/partial/simulated > - 2.3 explicitly indicates whether annotations are in scope and > whether they are supported for the queries > - 2.4 explicitly indicates whether time is supported for the queries, > and whether it is required for representing provenance. > > Also, some entries are empty for 2.5, 2.6 and 2.7, could you tell me > how to fill the blanks. > > Cheers, > Luc > > -- > Professor Luc Moreau > Electronics and Computer Science tel: +44 23 8059 4487 > University of Southampton fax: +44 23 8059 2865 > Southampton SO17 1BJ email: l.moreau AT ecs.soton.ac.uk > <mailto:l.moreau AT ecs.soton.ac.uk> > United Kingdom > http://www.ecs.soton.ac.uk/~lavm ↵ <http://www.ecs.soton.ac.uk/%7Elavm> > >
From: "Simon Miles" <sm AT ecs.soton.ac.uk>
Date: Tue, 28 Nov 2006 16:59:20 +0000
| Threading: | • This Message → Re: [provenance-challenge] Second provenance challenge from sm AT ecs.soton.ac.uk → Re: [provenance-challenge] Second provenance challenge from dholland AT eecs.harvard.edu → Re: [provenance-challenge] Second provenance challenge from sm AT ecs.soton.ac.uk → {SPAM?} Re: [provenance-challenge] Second provenance challenge from jmgomez AT isoco.com |
Hello, We have drafted a proposal for a second provenance challenge, derived from that discussed at the workshop in Washington in September. http://twiki.ipaw.info/bin/view/Challenge/SecondProvenanceChallenge We welcome any comments or suggestions - does it seem reasonable and what you were expecting? Can I ask that all comments are given by 6th December so that, if acceptable, the challenge can officially start on 8th December. Thanks, Simon, Juliana, Luc
From: "Simon Miles" <sm AT ecs.soton.ac.uk>
Date: Wed, 29 Nov 2006 11:20:06 +0000
| Threading: | ↑ [provenance-challenge] Second provenance challenge from sm AT ecs.soton.ac.uk • This Message |
Hello Jun, Thanks for the feedback. Jun Zhao wrote: > As I saw from the conclusion of the first challenge, it seems difficult to > compare the query results returned from different groups. One of the > problems that occurred to me when answering the first challenge were to > understand the scope of querying information space, i.e. should I retrieve > information from one run of the workflow or many runs. I am not sure > whether it matters that much to the other projects. I suppose it is implicit in the description and the variation in Question 7 that the answers are for one run of the workflow only. This could be made explicit in the challenge description. The method by which you would distinguish the workflow run of interest from others is certainly interesting. For Southampton, it is part of the query mechanism and so directly relevant for answering the queries in the challenges, but I agree it might not be as relevant for other teams for this challenge. I suggest we leave it to be documented by the teams if they think it relevant to their challenge results. > The second thing (as I read the challenge quickly, I might have missed > it:)), are there any requirements as to which projects we should choose to > pair up and how many we should choose? No, we have placed no requirements on whose data to try and translate / query over - as many other teams as possible! > Maybe we can also share the parsers if that would help? I agree that this is important. We have requested on the page that "...a reference [be] given to a free parser for that format" and that "we strongly encourage (but do not require) teams to export their data in XML" Hopefully this is enough to make the parsing of each others' data as straightforward as possible. Thanks, Simon > cheers, > > Jun > > On Nov 28 2006, Simon Miles wrote: > > > Hello, > > > > We have drafted a proposal for a second provenance challenge, derived > > from that discussed at the workshop in Washington in September. > > > > http://twiki.ipaw.info/bin/view/Challenge/SecondProvenanceChallenge > > > > We welcome any comments or suggestions - does it seem reasonable and > > what you were expecting? Can I ask that all comments are given by ↵ 6th > > December so that, if acceptable, the challenge can officially start ↵ on > > 8th December. > > > > Thanks, > > Simon, Juliana, Luc > > >
From: dholland AT eecs.harvard.edu (David Holland)
Date: Mon, 4 Dec 2006 20:42:50 -0500 (EST)
| Threading: | ↑ [provenance-challenge] Second provenance challenge from sm AT ecs.soton.ac.uk • This Message |
> http://twiki.ipaw.info/bin/view/Challenge/SecondProvenanceChallenge
So it says
: [T]he queries and their expected results were weakly specified, and
: so interpreted differently by different groups.
but there's no additional clarification of the queries.
I think at least some of this should be done beforehand; we've all run
the queries and probably in the process noticed things that were
underspecified, and it would make the downstream comparison of results
easier if all questions that have already arisen can be resolved in
advance.
Some points that come to mind:
- In Q2, what exactly is "the averaging of images with softmean"?
From a dataflow perspective, is the cutoff point supposed to be
the softmean process executions themselves, or the files used as
input to softmean? Or, similarly from an events perspective, is
the cutoff point where softmean has read the inputs and is doing
the computation before generating the outputs, or the point at
which softmean first begins execution, or what?
- In Q4, does "all invocations" mean "all invocations related ↵
to
this workflow" or "all invocations that might have ever
happened"? Same for Q6.
- In Q7, is the variant workload supposed to start from the *same*
input files, or new copies of the same input data? Should the
variant workload clobber the intermediate and output files from
the original, or should it be run such that both can exist
simultaneously (e.g., in a different directory)?
- For Q8 and Q9, we should all agree on a set of annotations to
perform on the various available files, and also when they should
be added relative to the workflow execution, so we all get
vaguely comparable results searching for them.
--
- David A. Holland / dholland AT eecs.harvard.edu
From: "Simon Miles" <sm AT ecs.soton.ac.uk>
Date: Tue, 5 Dec 2006 17:30:19 +0000
| Threading: | ↑ [provenance-challenge] Second provenance challenge from sm AT ecs.soton.ac.uk • This Message |
Hello David,
Thanks, these are good points. I have tried to fix the ambiguities
you indicate (details below).
http://twiki.ipaw.info/bin/view/Challenge/SecondProvenanceChallenge
First, as well as clarifications to the queries, it is apparent from
your and Jun's mails that we need to be more explicit about what
actually occurs before the provenance data is exported in the
challenge. In particular, some queries only make sense if the
workflow has been run more than once and we need to be able to
identify annotations within the exported data. I've added the
following text to make this explicit.
"Specifically, the exported data should contain:
* Documentation of the three parts of one run of the workflow as
shown in the Workflow Parts section below.
* Documentation of the three parts of one run of the workflow in
the adaptation specified by Provenance Query 7, i.e. replacing the
single convert procedure with two procedures, pgmtoppm then pnmtojpeg,
in workflow Part 3.
* The following annotations:
* Anatomy Image 1, as used in the first workflow run, is
annotated with key-value pair center=UChicago.
* Anatomy Image 2, as used in the first workflow run, is
annotated with key-value pairs center=southampton and
studyModality=speech.
If the output of a team differs from that given above, including
omissions of one or other piece of data, please make it clear in your
data output."
For the queries, I have added the clarifications below.
> - In Q2, what exactly is "the averaging of images with ↵
softmean"?
> From a dataflow perspective, is the cutoff point supposed to be
> the softmean process executions themselves, or the files used as
> input to softmean? Or, similarly from an events perspective, is
> the cutoff point where softmean has read the inputs and is doing
> the computation before generating the outputs, or the point at
> which softmean first begins execution, or what?
We will rephrase this as:
2. Find the process that led to Atlas X Graphic, excluding everything
prior to softmean outputting the Atlas Image, i.e. the inputs,
processing and outputs of align_warp and reslice, and the inputs and
processing of softmean will be excluded.
> - In Q4, does "all invocations" mean "all invocations ↵
related to
> this workflow" or "all invocations that might have ever
> happened"? Same for Q6.
We will rephrase these as:
4. Find all invocations of procedure align_warp that have ever
occurred in the system using a twelfth order nonlinear 1365 parameter
model (see model menu describing possible values of parameter "-m 12"
of align_warp) that ran on a Monday.
6. Find all images ever output from softmean where the warped images
taken as input were align_warped using a twelfth order nonlinear 1365
parameter model, i.e. "where softmean was preceded in the workflow,
directly or indirectly, by an align_warp procedure with argument -m
12."
> - In Q7, is the variant workload supposed to start from the *same*
> input files, or new copies of the same input data? Should the
> variant workload clobber the intermediate and output files from
> the original, or should it be run such that both can exist
> simultaneously (e.g., in a different directory)?
The query is rephrased below to answer your first point. The second
point above seems to assume too much about the operation of the
system: overwriting of data only makes sense in some systems (some
systems may pass data by value and never store it in a file or by
other means). I feel that stating it would be too restrictive - if it
is important to understanding your provenance data, please may it
clear with the exported data.
7. A user has run the workflow twice on the same input files, in the
second instance replacing each convert procedure in the final stage
with two procedures: pgmtoppm, then pnmtojpeg. Find the differences
between the two workflow runs. The exact level of detail in the
difference that is detected by a system is up to each participant.
> - For Q8 and Q9, we should all agree on a set of annotations to
> perform on the various available files, and also when they should
> be added relative to the workflow execution, so we all get
> vaguely comparable results searching for them.
OK, hopefully this point is addressed by the specification of
annotations at the start of the email.
Thanks,
Simon
From: jmgomez AT isoco.com
Date: Wed, 6 Dec 2006 00:11:18 +0000
| Threading: | ↑ [provenance-challenge] Second provenance challenge from sm AT ecs.soton.ac.uk • This Message |
Hi Simon,
As I said, our provenance system is still far from our goals but we still want ↵
to participate. Will provide you with the required materials (treces, etc) ↵
asap.
Thanks,
Jose
PS: thanks for the logs!
---
-----Original Message-----
From: "Simon Miles" <sm AT ecs.soton.ac.uk>
Date: Tue, 5 Dec 2006 17:30:19
To:provenance-challenge AT ipaw.info
Subject: Re: [provenance-challenge] Second provenance challenge
Hello David,
Thanks, these are good points. I have tried to fix the ambiguities
you indicate (details below).
http://twiki.ipaw.info/bin/view/Challenge/SecondProvenanceChallenge
First, as well as clarifications to the queries, it is apparent from
your and Jun's mails that we need to be more explicit about what
actually occurs before the provenance data is exported in the
challenge. In particular, some queries only make sense if the
workflow has been run more than once and we need to be able to
identify annotations within the exported data. I've added the
following text to make this explicit.
"Specifically, the exported data should contain:
* Documentation of the three parts of one run of the workflow as
shown in the Workflow Parts section below.
* Documentation of the three parts of one run of the workflow in
the adaptation specified by Provenance Query 7, i.e. replacing the
single convert procedure with two procedures, pgmtoppm then pnmtojpeg,
in workflow Part 3.
* The following annotations:
* Anatomy Image 1, as used in the first workflow run, is
annotated with key-value pair center=UChicago.
* Anatomy Image 2, as used in the first workflow run, is
annotated with key-value pairs center=southampton and
studyModality=speech.
If the output of a team differs from that given above, including
omissions of one or other piece of data, please make it clear in your
data output."
For the queries, I have added the clarifications below.
> - In Q2, what exactly is "the averaging of images with ↵
softmean"?
> From a dataflow perspective, is the cutoff point supposed to be
> the softmean process executions themselves, or the files used as
> input to softmean? Or, similarly from an events perspective, is
> the cutoff point where softmean has read the inputs and is doing
> the computation before generating the outputs, or the point at
> which softmean first begins execution, or what?
We will rephrase this as:
2. Find the process that led to Atlas X Graphic, excluding everything
prior to softmean outputting the Atlas Image, i.e. the inputs,
processing and outputs of align_warp and reslice, and the inputs and
processing of softmean will be excluded.
> - In Q4, does "all invocations" mean "all invocations ↵
related to
> this workflow" or "all invocations that might have ever
> happened"? Same for Q6.
We will rephrase these as:
4. Find all invocations of procedure align_warp that have ever
occurred in the system using a twelfth order nonlinear 1365 parameter
model (see model menu describing possible values of parameter "-m 12"
of align_warp) that ran on a Monday.
6. Find all images ever output from softmean where the warped images
taken as input were align_warped using a twelfth order nonlinear 1365
parameter model, i.e. "where softmean was preceded in the workflow,
directly or indirectly, by an align_warp procedure with argument -m
12."
> - In Q7, is the variant workload supposed to start from the *same*
> input files, or new copies of the same input data? Should the
> variant workload clobber the intermediate and output files from
> the original, or should it be run such that both can exist
> simultaneously (e.g., in a different directory)?
The query is rephrased below to answer your first point. The second
point above seems to assume too much about the operation of the
system: overwriting of data only makes sense in some systems (some
systems may pass data by value and never store it in a file or by
other means). I feel that stating it would be too restrictive - if it
is important to understanding your provenance data, please may it
clear with the exported data.
7. A user has run the workflow twice on the same input files, in the
second instance replacing each convert procedure in the final stage
with two procedures: pgmtoppm, then pnmtojpeg. Find the differences
between the two workflow runs. The exact level of detail in the
difference that is detected by a system is up to each participant.
> - For Q8 and Q9, we should all agree on a set of annotations to
> perform on the various available files, and also when they should
> be added relative to the workflow execution, so we all get
> vaguely comparable results searching for them.
OK, hopefully this point is addressed by the specification of
annotations at the start of the email.
Thanks,
Simon
From: "Simon Miles" <sm AT ecs.soton.ac.uk>
Date: Tue, 12 Dec 2006 10:51:58 +0000
Dear challenge participants, Thanks to all those that commented on the second challenge specification. We have tried to fix the text where problems or ambiguities were brought to our attention. Please now consider the challenge officially started. Any changes from now on will be minor clarifications and should not effect the content of the exercise. Introductions and instructions have been added to the main page: http://twiki.ipaw.info Please advertise it to anyone you think may be interested (as well as taking part yourselves). We look forward to seeing your uploaded provenance data at the end of of January! Thanks, Simon, Juliana, Luc
From: "Simon Miles" <sm AT ecs.soton.ac.uk>
Date: Tue, 12 Dec 2006 13:52:32 +0000
| Threading: | • This Message → Re: [provenance-challenge] New provenance queries from Washington workshop from ludaesch AT ucdavis.edu → Re: [provenance-challenge] New provenance queries from Washington workshop from carole.goble AT manchester.ac.uk → RE: [provenance-challenge] New provenance queries from Washington workshop from barga AT microsoft.com → Re: [provenance-challenge] New provenance queries from Washington workshop from juliana AT cs.utah.edu |
Hello again, During the Washington provenance workshop, it was suggested that we develop new provenance queries to examine the edge cases and issues not touched by the first challenge. We discussed many such queries, related to long-term use of provenance, accidental corruption of data etc. These have now been written up and uploaded to the TWiki. They aren't intended to be anything to do with the second challenge, but are hopefully a useful resource in themselves. http://twiki.ipaw.info/bin/view/Challenge/ProvenanceQueries If you remember any others I've forgotten, please feel free to add them! Thanks, Simon P.S. If you want even more provenance-related questions: we captured many use cases in interviews with biologists, chemists, physicists, computer scientists and social scientists at the start of our project in 2004. Many of these are specified in a paper (http://eprints.ecs.soton.ac.uk/13242, soon to be published in Journal of Grid Computing) and even more are available on the website www.pasoa.org
From: "Bertram Ludaescher" <ludaesch AT ucdavis.edu>
Date: Tue, 12 Dec 2006 06:20:10 -0800
| Threading: | ↑ [provenance-challenge] New provenance queries from Washington workshop from sm AT ecs.soton.ac.uk • This Message → RE: [provenance-challenge] New provenance queries from Washington workshop from deelman AT isi.edu → Re: [provenance-challenge] New provenance queries from Washington workshop from dholland AT eecs.harvard.edu → RE: [provenance-challenge] New provenance queries from Washington workshop from ysimmhan AT cs.indiana.edu |
Hi Simon and all: Re. the 2nd Prov. Challenge: I understand this is a "multi-phase" ↵ one, with the first phase ending in January. I'm a bit concerned that many teams might still recover from the 1st Challenge (talking here, at least in part for RWS and DAKS/COMAD). I'd like to suggest to have a "roll call" and see who actually plans to participate in the second challenge: (a) with the current schedule (b) with a schedule starting a bit later. For example, I would have liked to give some feedback on the 2nd challenge but I guess I'm a bit late (may still do so) Also, I think the teams for the 2nd challenge might benefit from digesting fully what was learned from the 1st one. For example, it might be good to read each other's papers (pre-prints) as the become available in Jan/Feb.. thoughts? Overall, I think it's a great idea to follow up on the 1st challenge and revisit the same workflow for the 2nd challenge (taking into account what was learned). Provenance interop sounds like a good topic. Maybe in addition to interop issue for the 1st challenge workflow, there could be a 2nd workflow that has advanced/alternative processing requirements .. Or maybe "meta-teams" or new teams could suggest specific ↵ "workflow patterns" and the provenance issues related to it. Patterns could include, e.g., data-dependent branching, pipelined execution, nested workflows, etc. Bertram >>> On Tue, 12 Dec 2006 13:52:32 +0000 >>> "Simon Miles" <sm AT ecs.soton.ac.uk> wrote: SM> SM> Hello again, SM> During the Washington provenance workshop, it was suggested that we SM> develop new provenance queries to examine the edge cases and issues SM> not touched by the first challenge. We discussed many such queries, SM> related to long-term use of provenance, accidental corruption of data SM> etc. These have now been written up and uploaded to the TWiki. They SM> aren't intended to be anything to do with the second challenge, but SM> are hopefully a useful resource in themselves. SM> SM> http://twiki.ipaw.info/bin/view/Challenge/ProvenanceQueries SM> SM> If you remember any others I've forgotten, please feel free to add them! SM> SM> Thanks, SM> Simon SM> SM> P.S. If you want even more provenance-related questions: we captured SM> many use cases in interviews with biologists, chemists, physicists, SM> computer scientists and social scientists at the start of our project SM> in 2004. Many of these are specified in a paper SM> (http://eprints.ecs.soton.ac.uk/13242, soon to be published in Journal SM> of Grid Computing) and even more are available on the website SM> www.pasoa.org
From: Carole Goble <carole.goble AT manchester.ac.uk>
Date: Tue, 12 Dec 2006 14:30:23 +0000
| Threading: | ↑ [provenance-challenge] New provenance queries from Washington workshop from sm AT ecs.soton.ac.uk • This Message |
Bertram This is a coincidence -- we also discussed this in the myGrid planning meeting today as we are concerned that we do not have the resources to set aside for the challenge and produce the Taverna 1.5 release, esp. as Jun, Duncan and Antoon are writing up their PhDs right now and Sky isn't up to speed as he has just started. So we are seriously considering withdrawing from the second challenge. Carole > Hi Simon and all: > > Re. the 2nd Prov. Challenge: I understand this is a ↵ "multi-phase" one, > with the first phase ending in January. > > I'm a bit concerned that many teams might still recover from the 1st > Challenge (talking here, at least in part for RWS and DAKS/COMAD). > > I'd like to suggest to have a "roll call" and see who actually ↵ plans > to participate in the second challenge: > (a) with the current schedule > (b) with a schedule starting a bit later. > > For example, I would have liked to give some feedback on the 2nd > challenge but I guess I'm a bit late (may still do so) > > Also, I think the teams for the 2nd challenge might benefit from > digesting fully what was learned from the 1st one. For example, it > might be good to read each other's papers (pre-prints) as the become > available in Jan/Feb.. > > thoughts? > > Overall, I think it's a great idea to follow up on the 1st challenge > and revisit the same workflow for the 2nd challenge (taking into > account what was learned). Provenance interop sounds like a good > topic. Maybe in addition to interop issue for the 1st challenge > workflow, there could be a 2nd workflow that has advanced/alternative > processing requirements .. > > Or maybe "meta-teams" or new teams could suggest specific ↵ "workflow > patterns" and the provenance issues related to it. Patterns could > include, e.g., data-dependent branching, pipelined execution, > nested workflows, etc. > > Bertram > > >>>> On Tue, 12 Dec 2006 13:52:32 +0000 >>>> "Simon Miles" <sm AT ecs.soton.ac.uk> wrote: >>>> > SM> > SM> Hello again, > SM> During the Washington provenance workshop, it was suggested that we > SM> develop new provenance queries to examine the edge cases and issues > SM> not touched by the first challenge. We discussed many such ↵ queries, > SM> related to long-term use of provenance, accidental corruption of ↵ data > SM> etc. These have now been written up and uploaded to the TWiki. ↵ They > SM> aren't intended to be anything to do with the second challenge, but > SM> are hopefully a useful resource in themselves. > SM> > SM> http://twiki.ipaw.info/bin/view/Challenge/ProvenanceQueries > SM> > SM> If you remember any others I've forgotten, please feel free to add ↵ them! > SM> > SM> Thanks, > SM> Simon > SM> > SM> P.S. If you want even more provenance-related questions: we ↵ captured > SM> many use cases in interviews with biologists, chemists, physicists, > SM> computer scientists and social scientists at the start of our ↵ project > SM> in 2004. Many of these are specified in a paper > SM> (http://eprints.ecs.soton.ac.uk/13242, soon to be published in ↵ Journal > SM> of Grid Computing) and even more are available on the website > SM> www.pasoa.org > >