[NLP2RDF] Changes to NIF Ontologies was: (Re: Extending NIF Ontologies)

Sebastian Hellmann hellmann at informatik.uni-leipzig.de
Fri Mar 9 07:30:09 CET 2012


Hi Carina,
I think that NIF is complementary to the annotation ontology, so it is 
very good if we can find a way to combine it.
You are also right that ontologies need endless discussion and time. I 
talked to several people and we are planning to create some sort of 
advisory board, where any stakeholder might join in. So you are welcome 
to join this.
I will be on holiday until the beginning April and then we will start 
talking about NIF 2.0 .
Here are the most recent slides for NIF: 
http://www.slideshare.net/kurzum/thesis-presentation-11928355
I think it can become a KR formalism that combines the WWW, GGG and NLP 
in a transparent way for machines.
We will also try to trim down size and complexity, so we achieve 
scalability.

All the best,
Sebastian

On 03/08/2012 03:37 PM, Carina Haupt wrote:
> Hi Sebastian,
>
> sorry for the late reply, but we had some further schema development 
> on our side.
>
>
>
> On 16.02.2012 18:40, Sebastian Hellmann wrote:
>> Hello Carina,
>> finally I found some time to review everything and think about an
>> extension of NIF. I extensively talked to the Raphaël Troncy and
>> Giuseppe Rizzo of the NERD project [1] and we seem to converge finally.
>> Pelase give us feedback. I attached an early draft, which is already
>> outdated again.
>> Here are the *proposed* changes:
>>
>> 1. the offset URI will not have a human readable part any more. It
>> serves no function after all.
>> 2. The class str:Document is replaced by str:Context. The definition of
>> Context is defined in the attached PDF .
>> 3. All URIs of type str:String have to refer to an element of the
>> powerset of the concatenation of Unicode characters. So they will get a
>> strict formal interpretation.
>> 4. scms:means will be removed.
>> 5. we will include 2 properties sso:oen and sso:oec which allow to
>> attach Linked Data URIs to Strings. (this is One Entity per Name and One
>> Entity per Context)
>
> I think this changes make sense. But I would suggest to not name the 
> predicates sso:oec and sso:oen since this labels are not 
> understandable without background knowledge.
>
>> For the string to entity part, we should explain the so-called variant
>> 1, and mentions 3 cases:
>> - case 1: a NER extractor has provided a linked data URI to disambiguate
>> the entity ... we re-use it
>> - case 2: a NER extractor has provided a non-linked data URI to
>> disambiguate the entity (typically, the foaf:homepage of an
>> organization) ... we mint a new linked data URI
>> - case 3: a NER extractor does not provide disambiguation links ... we
>> mint a new linked data URI
>>
>> We are still unsure how the Linked Data URI will look like though...
>
> We are actually linking our entity to an own data (or concept) URI 
> which then again is linked to an existing data (or concept) URI.
> What exactly do you mean with "linked" data URI? Does is mean that the 
> URI has to part of an existing database or that it has to be 
> dereferenced or just that it has to be a URI at all?
>
> To the concept and hit problematic: We decided to use the AO 
> (Annotation ontology) schema. The advantage of their schema is that 
> they already deal with versioning, storing provenance information to 
> annotation sets, and especially they can also handle i.e. images, 
> which is one of our next steps. Here sso does not fit our needs since 
> we do not only want to do text but data mining.
> But we will still use sso. We plan to use the AO schema only for the 
> basic structure to connect the documents with the annotations and 
> these with the concepts (which have to be of type skos:Concept or we 
> could not use skos to describe the relations between the concepts).
> AO also has so called Selectors which describe the annotation. We want 
> these to be subclasses of NamedEntity, Relation, Image, etc., and we 
> would like to reuse all text mining related classes which become part 
> of sso.
>
> Best regards,
> Carina
>
>> It is still open how to attach skos:concepts :
>>
>> On 01/18/2012 02:44 PM, Carina Haupt wrote:
>>> I would propose to generate a property like pao:incarnationOf
>>> (actually I am not 100% happy with this expression), which needs a
>>> pao:Hit as domain and skos:Concept as range, and also is a subproperty
>>> of dc-terms:subject and perhaps also of scms:means. But to be able
>>> include scms:means, we would first need to have it's definition, so
>>> that we can check if everything is consistent.
>>
>> I would suggest to name the class "NamedEntity" as this would cover all
>> three occurences (OEN, OEC, skos:Concept)
>> Here is what Raphael said:
>>
>>> dcterms:subject seems to fit well:
>>> :offset_x_y dcterms:subject
>>> <http://dbpedia.org/resource/Category:International_nongovernmental_organizations> 
>>>
>>>
>>
>> "I thought about that ... but this predicate is very general, on
>> purpose, while I think here we want to be a bit more precise, stating
>> that a particular string of chars, that happen to be recognized as the
>> label of a real world named entity, occurs within a context ... so I
>> would prefer creating a new predicate to materialize this semantics,
>> thus the sso:oen ... now I'm happy if you define this term in the sso
>> ontology or at least if we agree on the definition. "
>>
>> sso:oen could have a NamedEntity as Domain. It could cover both use
>> cases, i.e. any Entity including skos:Concepts. Or we could make a
>> separate Property. Having NamedEntity as Domain and skos:Concept as
>> Range. I am also not 100% happy with calling it "sso:incarnationOf" Any
>> suggestions?
>>
>> I am not sure how you provenance model can cope with the new grounding
>> of String and Context on Unicode. I hope it separates the layers more
>> nicely now...
>> If I look at your image Datenschema.png I think you would need to
>> replace str:Document with foaf:Document and then define a str:Context
>> node and connect via a property. Should we call it str:occursIn with
>> Domain str:Context and Range foaf:Document?
>>
>> Sorry again for answering so late. Ontologies seem to need endless
>> discussions. But I think, we are close to covering the core concepts of
>> the NERD domain ....
>> All the best,
>> Sebastian
>>
>> [1] http://nerd.eurecom.fr
>>
>>
>


-- 
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Projects: http://nlp2rdf.org , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org



More information about the NLP2RDF mailing list