[NLP2RDF] NIF and provenance

Fri Dec 19 09:05:12 CET 2014

Dear all,

let me give a little more detail in addition to the information provided by Philipp.

Our goal is to add provenance information in a way that this provenance information can be exploited to efficiently retrieve subsets of NIF corpora on their basis. Our original approach was from a querying perspective, where we wanted to solve problems such as retrieving the correct triples for questions such as

- Give me all texts in the corpus annotated by person X.
- Give me those layers of PoS information generated by service Y.
- Give me all annotations from the corpus that have been validated by service Z.

However, some properties of NIF make it challenging for us to model this kind of information. One of them is the fact that NIF annotations use the "#char=n,m" notation as subjects for many related annotations. This makes it difficult to identify and address different kinds of annotations resulting from different activities. However, the identification of single layers of annotations seems very important for our problems.

For instance, one of our use cases is related to using multiple PoS taggers that use the same tag set. They produce different results, but we cannot express the provenance information in a way that allows for the identification of the origin of a particular pos tag token (e.g., when we want to answer questions such as "Which tagger can be blamed for this erroneous tag? Tagger A or tagger B?") 

Also, we looked at annotations that correct a previous layer of data, such as a manual correction of an automated tagging service. As soon as the automated tagger result is published in NIF there is no easy way of adding information about single corrections of its results.

This, in a nutshell, is the background of what we attempt to achieve with provenance metadata in our current project. I will add further information as soon as possible, but I hope that this gives you a better impression and avoids some misunderstandings. 

Best regards,
Peter Menke

--  
Peter Menke
SFB 673 "Alignment in Communication"
Project X1 "Multimodal Alignment Corpora"
Universität Bielefeld
Postfach 10 01 31
33501 Bielefeld

CITEC-Gebäude, Raum 2.309
Telefon (+49 521) 106-67328

On 18. Dezember 2014 at 23:32:40, Philipp Cimiano (cimiano at cit-ec.uni-bielefeld.de) wrote:
> Dear Rob, all,
>  
> thanks for your answer. I should have been more precise in my
> question. We are actually building on PROV as well. So we are not
> looking into extending NIF by a provenance layer, but really working out
> best practices for representing provenance information in NIF building
> on PROV.
>  
> Our actual question is where to attach the provenance information too.
> We see two options: i) reifying all annotation triples and add
> provenance information to the reified object representing the annotaiton
> or ii) using named graphs to attach provenance information to the graph.
>  
> Are there any experiences with these two options that you can share with us?
>  
> Best regards,
>  
> Philipp.
>  
> Am 18.12.14 15:12, schrieb Rob H Warren:
> > Philip,
> >
> > What would be the advantage of adding to NIF versus reusing the PROV ontology for this  
> purpose? I have used this for some projects and have not seen a corner case yet.
> >
> > -rhw
> >
> > On Dec 17, 2014, at 5:02 PM, nlp2rdf-request at lists.informatik.uni-leipzig.de wrote:  
> >> Dear all,
> >>
> >> as I mentioned to Sebastian briefly, we are working on adding a
> >> provenance layer to NIF. And we would like to talk to you.
> >>
> >> I sort of agreed with Sebastian that we could meet in Leipzig to discuss
> >> this.
> >>
> >> However, I think it is more efficient if we exchange material beforehand
> >> and have a telco early January to discuss the material.
> >>
> >> Sebastian: you mentioned that you have some proposals on how to
> >> represent provenance of NIF annotations. Can you share that proposal
> >> with us?
> >>
> >> Peter: can you circulate our draft to the people in Leipzig (i.e.
> >> Sebastian and Martin).
> >>
> >> We can then start from there and organize a telco in January.
> >>
> >> Best regards,
> >>
> >> Philipp.
> >>
> >> --
> >> --
> >> Prof. Dr. Philipp Cimiano
> >> AG Semantic Computing
> >> Exzellenzcluster f?r Cognitive Interaction Technology (CITEC)
> >> Universit?t Bielefeld
> >>
> >> Tel: +49 521 106 12249
> >> Fax: +49 521 106 6560
> >> Mail: cimiano at cit-ec.uni-bielefeld.de
> >>
> >> Office CITEC-2.307
> >> Universit?tsstr. 21-25
> >> 33615 Bielefeld, NRW
> >> Germany
> >>
> >>
> >>
> >> ------------------------------
>  
> --
> --
> Prof. Dr. Philipp Cimiano
> AG Semantic Computing
> Exzellenzcluster für Cognitive Interaction Technology (CITEC)
> Universität Bielefeld
>  
> Tel: +49 521 106 12249
> Fax: +49 521 106 6560
> Mail: cimiano at cit-ec.uni-bielefeld.de
>  
> Office CITEC-2.307
> Universitätsstr. 21-25
> 33615 Bielefeld, NRW
> Germany
>  
>