[NLP2RDF] Extending NIF Ontologies

Fri Jan 13 07:44:44 CET 2012

Dear Carina,
nice to hear that NIF fits your Use Case, I have several comments inline.
I think my answers are a little bit fuzzy, as I would need more concrete 
examples to answer more precise.

On 01/11/2012 05:06 PM, Carina Haupt wrote:
> Hi,
>
> I am part of the OpenPHACTS project and it is my task to present 
> textmining results in RDF. Therefore I am using the String as well as 
> the SSO ontology of NIF. It allows me to represent most of my 
> information, but unfortunately some predicates and classes are missing 
> for my use case.
> What I want to do is to represent not only the annotations made by the 
> text mining tool, but also the texts they were found in, as well as 
> the concepts which the annotations represent.
> To represent the texts I use dc-term and to describe the concepts skos.
NIF uses URIs to represent texts or fragments of text. So I do not 
really understand what you mean by "To represent the texts I use 
dc-term" . Does it mean you annotate NIF URIs (representing texts) with 
the dcterms vocab?

>
> What I am missing is the connection between a concept and an 
> annotation, as well as a type for the annotation itself. In my 
> Institute (Fraunhofer SCAI), we call such an annotation a hit. To be 
> able to complete my RDF schema I extended the ontology by adding 
> pao:Hit and pao:incarnationOf (pao stands for Prominer Annotation 
> Ontology and is based on SSO). pao:Hit thereby is a subclass of 
> string:String and sso:incarnationOf needs a skos:Concept as domain and 
> has pao:Hit as range.
Hm, as far as i understood it the pao:incarnationOf property is quite 
similar to the scms:means property, which is used to connect Strings 
with DBpedia Entities. So your basic use case is that you have a text 
with "Mentions" or "Hits" which are mapped to "Concepts" . Using your 
own property for this is fine. You could also use dcterms:subject 
directly.  An example would really help here. Can we have a look at the 
prominer ontology? We could include the ontology into NIF and also 
generate Java Classes (OWL2Java) and Documentation (OWLDoc) for it.

>
> Next to the text mining results I also store provenance information 
> where I need to describe the used text mining tools. I think this use 
> case is not covered by NIF so far, but should be suggested in further 
> development. Im my case I added the class pao:Annotator and the 
> predicate pao:annotatorClass.
That is a problem of RDF in general and it is indeed an issue that has 
not been solved. In general, the NIF architecture pushes this problem to 
the client. So if there is a request, the client receives RDF data and 
then needs to store it in a way that it can attach provenance 
information. E.g. it could make one Named Graph for each different tool 
it requested or partition it with higher granularity (e.g. a named graph 
per tool and per request). Another possibility is to use the "prefix" 
variable and encode the provenance in the URI, e.g. 
http://prominer.org/syntax/tool/doc1#....
In case you would like to annotate individual triples, OWL axiom 
annotations would be a possibility although they increase the size of 
the model immensely.

For your specific application you might also mix RDF with a relational 
database table : varChar:TripleID, varChar:key, varChar:value
As TripleID you can use the md5 hash over the NTriple serialization of 
the triple. This should be sufficient unique. Do not forget to index the 
first column otherwise it will be very slow ). key and value would be 
the provenance info. It only works one-way of course.

Regards,
Sebastian

-- 
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Projects:http://nlp2rdf.org  ,http://dbpedia.org
Homepage:http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group:http://aksw.org