[NLP2RDF] Fwd: Merging of the NIF Core Ontology and FISE from Apache Stanbol (and some other news)

Rupert Westenthaler rupert.westenthaler at gmail.com
Wed Jul 10 16:54:30 CEST 2013


Hi all,

Forgot to include the lists on my last replay to Sebastian. So please
note the forwarded message below.

best
Rupert


---------- Forwarded message ----------
From: Rupert Westenthaler <rupert.westenthaler at gmail.com>
Date: Wed, Jul 10, 2013 at 3:10 PM
Subject: Re: [NLP2RDF] Merging of the NIF Core Ontology and FISE from
Apache Stanbol (and some other news)
To: Sebastian Hellmann <hellmann at informatik.uni-leipzig.de>


Hi Sebastian

On Wed, Jul 10, 2013 at 2:02 PM, Sebastian Hellmann
<hellmann at informatik.uni-leipzig.de> wrote:
> 1.
> fise:extractedFrom should actually be nif:referenceContext if between
> TextAnnotation and nif:Context and some other property, when between
> EntityAnnotation and Context/ContentItem.
> Ideally, we would model this with an OWL property chain. Together with
> Giuseppe Rizzo and Raphael Troncy, we made a draft here, page 5 section 4.2:
> http://events.linkeddata.org/ldow2012/papers/ldow2012-paper-02.pdf‎
>
> This is a general feature: being able to query annotations on a ContentItem
> level.
> The idea in the paper was to have dc:relation and then infer
> fise:extractedFrom as a shortcut for queries.
> This would probably be an acceptable solution for Stanbol, or do you require
> fise:extractedFrom to be materialized in RDF output.

Yes it is as Stanbol does not expect clients processing enhancement
results to support any kind of reasoning. All fise:Enhancement
instances need to define this property to allow users to simple get to
a list of all enhancements.

> I created an issue here:
> https://github.com/NLP2RDF/persistence.uni-leipzig.org/issues/3
>
> 2. issue is created here:
> https://github.com/NLP2RDF/persistence.uni-leipzig.org/issues/4
>
> It seems to me that we might be able to reuse an existing property, if an
> entity uri is given. Instead of creating a new property, we can just define
> one to be reused in the NIF specification:
> "In case, enity uri is given rdfs:label should/must be used. "
>
> Is it the case, that you do require so many shortcut properties? Is it to
> avoid joins in SPARQL queries?

I agree that those properties are not strictly required. The reason
for those properties is to make the enhancement results self
consistent.

The Stanbol Enhancer provides a RESTful service where the user sends
some text and receives a RDF graph. The intension is that the returned
graph contains all the necessary knowledge to process the result.

Omitting:
* fise:selected-text: would require the user to extract the anchor
text from the sent content. If the content was a PDF document this
could be a very complex task. Hence we include this information in the
graph
* fise:entity-type: would require the user to obtain the RDF for the
referenced entity (e.g. making an other http lookup) maybe even to an
other host. If the server does not support CORS this might not be
successful within a browser. Hence we include those information in the
enhancement results. In addition users can configure EntityLinking to
use a other property as rdf:type as Entity Type. E.g. for Geonames one
could use the "geonames:featureCode" as type property as values of
this property do provide a better classification over the types of
linked Entities.
* fise:entity-label: principally the same as for fise:entity-type. But
in this case it also provide the actual label that was matched. For a
client it could be quite complex to determine the best matching label
for the "fise:selected-text" especially if the entity has a lot of
alternate labels, the matching process used some NLP stuff such as
lemma, stemming, the matching process supports tokens in the wrong
ordering (e.g. if the label notes "{given-name} {family-name}" but the
mention uses "{family-name} {given-name}") ...

So adding those additional triples seams to be a good tradeoff for a
RESTful service. In usage scenarios where one dose not need those
information one could simple remove those in an post processing
engine.

> Trade-off is additional triples or additional query patterns (please excuse
> any syntax errors ):
>
> with shortcut
>
> Select ?label {
>     ?s a nif:EntityAnnotation .
>     ?s  fise:entity-label  ?label .
> }
>
> or alternatively (all three possible):
>
>
> Select ?label {
>     ?s a nif:EntityAnnotation .
>     ?s itsrdf:taIdentRef ?entity .
>     ?entity  rdfs:label  ?label .
> }
>

How would you know that '?entity' uses the rdfs:label property for the
labels. It might also be foaf:name, skos:prefLabel, schema:name ...

> or
>
> Select ?labels {
>     ?s a nif:EntityAnnotation .
>     ?s  dc:relation ?string .
>     ?string a nif TextAnnotation .
>     ?string nif:anchorOf ?label .
> }
>

The label of the Entity might be different to the anchor Text. E.g. if
the Text mentions a person by switching the order of given and family
name, or if the mention uses the plural form of an Entity.

> or even very generic, using http://www.w3.org/TR/sparql11-query/#func-substr
> (no extra property needed)
>
> Select ?labels {
>     ?s a nif:EntityAnnotation .
>     ?s  dc:relation ?string .
>     ?string a nif:TextAnnotation , nif:RFC5147String  .
>     ?string nif:beginIndex ?b .
>     ?string nif:endIndex ?e .
>     ?string nif:referenceContext  ?contentItem .
>     ?contentItem nif:isString ?text .
>     BIND (SUBSTR (?text, ?b, (?e - ?b) ) as ?label ) .
> }
>

This would not work if the sent content was encoded in a rich text format


best
Rupert



>
> Thanks for the feedback,
> Sebastian
>
> Am 10.07.2013 12:43, schrieb Rupert Westenthaler:
>
>> Hi Sebastian,
>>
>> Thanks for all your effort Sebastian!
>>
>> Sorry for my late response, but I am traveling this and the next week.
>> So I do not have enough time to properly look into this. At least I
>> had the change to have a look at the mappings [4]. I will try to
>> provide some initial feedback with this mail.
>>
>>
>> ### The fise:entity-label
>>
>> regarding the comment:
>>
>> # ??? not sure about fise:entity-label
>> # This should be dbpedia:London rdfs:label "London"@en ;
>> #
>> # fise:entity-label
>> #    a owl:DatatypeProperty ;
>> #    rdfs:comment "the label(s) of the referenced Entity"@en ;
>> #    rdfs:domain :EntityAnnotation .
>>
>>
>> This property is intended to hold the label of the entity that best
>> matched the mention in the text. It is not always the case that this
>> will be the rdfs:label. This depends on the ontology used by suggested
>> Entity. Labels might also come from multiple properties (e.g. in the
>> case of SKOS where both skos:prefLabel, skos:altLabel do hold label
>> information)
>>
>> In addition the fise:entity-type and fise:entity-label values are
>> intended to be used for visualization purposes in cases where the
>> client does not dereference the linked (fise:entity-reference) entity.
>>
>>
>> ### None FISE namespace properties
>>
>> I would also like to mention that the Stanbol Enhancement Structure
>> also uses some DC Type properties (e.g. fise:EntityAnntoation do use
>> dot:relation to link to the fise:TextAnnotation) (see [9] for a figure
>> and [10] for the full documentation). AFAIU those none fise namespace
>> properties are not included in the current mapping.
>>
>> best
>> Rupert
>>
>> [9]
>> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.png
>> [10]
>> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html
>>
>> On Mon, Jul 8, 2013 at 12:35 PM, Sebastian Hellmann
>> <hellmann at informatik.uni-leipzig.de> wrote:
>>>
>>> Good News Everyone!
>>>
>>> Yesterday, I had a more detailed look at the FISE Ontology[1], which is
>>> supposed to provide an output format for Apache Stanbol [2]. After a
>>> while I
>>> found out that they fit together without major problems, so I went ahead
>>> and
>>> created a merging proposal! The NIF Core Ontology [3] is now dually
>>> licensed
>>> under CC-By 3.0 and Apache 2.0 to ease integration.
>>> The NIF inf model ([4], line 92) documents the mapping from NIF-Core to
>>> Fise.  One minor problem is that FISE uses xsd:int and not
>>> xsd:nonNegativeInteger for indices.
>>> The mapping is complete except for the (probably unnecessary)
>>> fise:entity-label.
>>> (cross posting to Stanbol and NLP2RDF list)
>>>
>>> Due to this merger, the creation of a release candidate for NIF 2.0 is
>>> coming along quite well now. The seven properties/classes (Context,
>>> String,
>>> isString, RFC5147String, endIndex, beginIndex, referenceContext) at the
>>> core
>>> are already stable.
>>>
>>> We included a lot of people in the attribution section[5] as well, but
>>> the
>>> list is not yet exhaustive.  Also there is a NIF validator software and
>>> a
>>> logging ontology.
>>> The Validator uses SPARQL 1.1 to produce log output adhering to the
>>> logging
>>> ontology.
>>> * Jar for CLI (no webservice yet) [6]
>>> * Readme [7]
>>> * The SPARQL queries [8]
>>>
>>>
>>> Please feel free to:
>>> * download the validator and test it on your NIF implementation to see
>>> what
>>> changed
>>> * check if we spelled your name correctly ;) in the attribution section
>>> [5].
>>> * Please also tell us whether we should add your logo in the
>>> maintainer/supporter section
>>> * check and extend the ontology and write some additional SPARQL
>>> queries[8]
>>>
>>> Since we moved to GitHub now, we are also eager to give out push
>>> permissions....
>>>
>>> All the best,
>>> Sebastian
>>>
>>>
>>> [1] http://fise.iks-project.eu/ontology/
>>> [2] http://stanbol.apache.org/
>>> [3]
>>>
>>> https://github.com/NLP2RDF/persistence.uni-leipzig.org/blob/master/ontologies/nif-core/version-1.0/nif-core.ttl
>>> [4]
>>>
>>> https://github.com/NLP2RDF/persistence.uni-leipzig.org/blob/master/ontologies/nif-core/version-1.0/nif-core-inf.ttl#L92
>>> [5] http://persistence.uni-leipzig.org/nlp2rdf/#attribution
>>> [6] https://github.com/NLP2RDF/java-maven/raw/master/validate.jar
>>> [7]
>>> https://github.com/NLP2RDF/java-maven/blob/master/README.md#nif-validator
>>> [8]
>>>
>>> https://github.com/NLP2RDF/java-maven/tree/master/core/jena/src/main/resources/sparqltest
>>>
>>> --
>>> On holidays from 11. July until 3. August
>>>
>>> Dipl. Inf. Sebastian Hellmann
>>> Department of Computer Science, University of Leipzig
>>> Events:
>>> * NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org, Extended
>>> Deadline: *July 18th*)
>>> * LSWT 23/24 Sept, 2013 in Leipzig (http://aksw.org/lswt)
>>> Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
>>> Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
>>> http://dbpedia.org/Wiktionary , http://dbpedia.org
>>> Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
>>> Research Group: http://aksw.org
>>> _______________________________________________
>>> NLP2RDF mailing list
>>> NLP2RDF at lists.informatik.uni-leipzig.de
>>> http://lists.informatik.uni-leipzig.de/mailman/listinfo/nlp2rdf
>>
>>
>>
>
>
> --
> On holidays from 11. July until 3. August
> Dipl. Inf. Sebastian Hellmann
> Department of Computer Science, University of Leipzig
> Events:
> * NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org, Extended
> Deadline: *July 18th*)
> * LSWT 23/24 Sept, 2013 in Leipzig (http://aksw.org/lswt)
> Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
> Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
> http://dbpedia.org/Wiktionary , http://dbpedia.org
> Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
> Research Group: http://aksw.org



--
| Rupert Westenthaler             rupert.westenthaler at gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen


--
| Rupert Westenthaler             rupert.westenthaler at gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen


More information about the NLP2RDF mailing list