[NLP2RDF] NIF and provenance

Sat Dec 20 12:27:48 CET 2014

Hi Peter,

Might I suggest that you consider looking at the W3C Web Annotation working group (http://www.w3.org/annotation/) and specifically the Open Annotation data model (http://www.openannotation.org/spec/core/) ?

In prior work, we also tried to make a proposal to capture detailed provenance information. It is a "heavy" representation, and there is probably room for improvement, but we felt that it was important to think about how to capture compositional annotations and their provenance as well. Please see http://www.jbiomedsem.com/content/4/1/38.

Best regards,
Karin

--
Karin Verspoor, PhD
Associate Professor, Dept of Computing and Information Systems
The University of Melbourne
Victoria 3010 Australia
T: +61 3 8344 4902 | M: +61 (0)4 7840 8290
Email: karin.verspoor at unimelb.edu.au<mailto:karin.verspoor at unimelb.edu.au>

On 19/12/2014, at 7:05 PM, Peter Menke <pmenke at techfak.uni-bielefeld.de<mailto:pmenke at techfak.uni-bielefeld.de>> wrote:

Dear all,

let me give a little more detail in addition to the information provided by Philipp.

Our goal is to add provenance information in a way that this provenance information can be exploited to efficiently retrieve subsets of NIF corpora on their basis. Our original approach was from a querying perspective, where we wanted to solve problems such as retrieving the correct triples for questions such as

- Give me all texts in the corpus annotated by person X.
- Give me those layers of PoS information generated by service Y.
- Give me all annotations from the corpus that have been validated by service Z.

However, some properties of NIF make it challenging for us to model this kind of information. One of them is the fact that NIF annotations use the "#char=n,m" notation as subjects for many related annotations. This makes it difficult to identify and address different kinds of annotations resulting from different activities. However, the identification of single layers of annotations seems very important for our problems.

For instance, one of our use cases is related to using multiple PoS taggers that use the same tag set. They produce different results, but we cannot express the provenance information in a way that allows for the identification of the origin of a particular pos tag token (e.g., when we want to answer questions such as "Which tagger can be blamed for this erroneous tag? Tagger A or tagger B?")

Also, we looked at annotations that correct a previous layer of data, such as a manual correction of an automated tagging service. As soon as the automated tagger result is published in NIF there is no easy way of adding information about single corrections of its results.

This, in a nutshell, is the background of what we attempt to achieve with provenance metadata in our current project. I will add further information as soon as possible, but I hope that this gives you a better impression and avoids some misunderstandings.

Best regards,
Peter Menke

--
Peter Menke
SFB 673 "Alignment in Communication"
Project X1 "Multimodal Alignment Corpora"
Universität Bielefeld
Postfach 10 01 31
33501 Bielefeld

CITEC-Gebäude, Raum 2.309
Telefon (+49 521) 106-67328

On 18. Dezember 2014 at 23:32:40, Philipp Cimiano (cimiano at cit-ec.uni-bielefeld.de<mailto:cimiano at cit-ec.uni-bielefeld.de>) wrote:
Dear Rob, all,

thanks for your answer. I should have been more precise in my
question. We are actually building on PROV as well. So we are not
looking into extending NIF by a provenance layer, but really working out
best practices for representing provenance information in NIF building
on PROV.

Our actual question is where to attach the provenance information too.
We see two options: i) reifying all annotation triples and add
provenance information to the reified object representing the annotaiton
or ii) using named graphs to attach provenance information to the graph.

Are there any experiences with these two options that you can share with us?

Best regards,

Philipp.

Am 18.12.14 15:12, schrieb Rob H Warren:
Philip,

What would be the advantage of adding to NIF versus reusing the PROV ontology for this
purpose? I have used this for some projects and have not seen a corner case yet.

-rhw

On Dec 17, 2014, at 5:02 PM, nlp2rdf-request at lists.informatik.uni-leipzig.de<mailto:nlp2rdf-request at lists.informatik.uni-leipzig.de> wrote:
Dear all,

as I mentioned to Sebastian briefly, we are working on adding a
provenance layer to NIF. And we would like to talk to you.

I sort of agreed with Sebastian that we could meet in Leipzig to discuss
this.

However, I think it is more efficient if we exchange material beforehand
and have a telco early January to discuss the material.

Sebastian: you mentioned that you have some proposals on how to
represent provenance of NIF annotations. Can you share that proposal
with us?

Peter: can you circulate our draft to the people in Leipzig (i.e.
Sebastian and Martin).

We can then start from there and organize a telco in January.

Best regards,

Philipp.

--
--
Prof. Dr. Philipp Cimiano
AG Semantic Computing
Exzellenzcluster f?r Cognitive Interaction Technology (CITEC)
Universit?t Bielefeld

Tel: +49 521 106 12249
Fax: +49 521 106 6560
Mail: cimiano at cit-ec.uni-bielefeld.de<mailto:cimiano at cit-ec.uni-bielefeld.de>

Office CITEC-2.307
Universit?tsstr. 21-25
33615 Bielefeld, NRW
Germany

------------------------------

--
--
Prof. Dr. Philipp Cimiano
AG Semantic Computing
Exzellenzcluster für Cognitive Interaction Technology (CITEC)
Universität Bielefeld

Tel: +49 521 106 12249
Fax: +49 521 106 6560
Mail: cimiano at cit-ec.uni-bielefeld.de<mailto:cimiano at cit-ec.uni-bielefeld.de>

Office CITEC-2.307
Universitätsstr. 21-25
33615 Bielefeld, NRW
Germany

_______________________________________________
NLP2RDF mailing list
NLP2RDF at lists.informatik.uni-leipzig.de<mailto:NLP2RDF at lists.informatik.uni-leipzig.de>
http://lists.informatik.uni-leipzig.de/mailman/listinfo/nlp2rdf

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.informatik.uni-leipzig.de/pipermail/nlp2rdf/attachments/20141220/bfb36590/attachment.html>