[NLP2RDF] document and corpus level aggregates

Sebastian Hellmann hellmann at informatik.uni-leipzig.de
Fri May 31 11:01:13 CEST 2013


Excellent, thanks, I created an issue here:
https://github.com/NLP2RDF/persistence.uni-leipzig.org/issues/2

For all participating researchers, we could submit an ontology 
description here:
http://www.semantic-web-journal.net/authors

I wouldn't mind having a lot of authors on such a submission. I think, 
this one could be cited quite often once it is published.
I will set up a paper for this soon (in a week or two) and would welcome 
everyone to join.
All the best,
Sebastian


/Descriptions of ontologies/ – short papers describing ontology modeling 
and creation efforts. The descriptions should be brief and pointed, 
indicating the design principles, methodologies applied at creation, 
comparison with other ontologies on the same topic, and pointers to 
existing applications or use-case experiments. It is strongly 
encouraged, that the described ontologies are free, open, and accessible 
on the Web. If this is not possible, then the ontologies have to be made 
available to the reviewers. For commercial ontologies, exceptions can be 
arranged through the editors. These submissions will be reviewed along 
the following dimensions: (1) Quality and relevance of the described 
ontology (convincing evidence must be provided). (2) Illustration, 
clarity and readability of the describing paper, which shall convey to 
the reader the key aspects of the described ontology.

Am 30.05.2013 12:40, schrieb David Lewis:
> I think some of these issues can only be addressed by understanding 
> where any particular nif model sits within a process chain. The best 
> way to record this in my view is using the provenance ontology. In 
> mlw-lt we've been looking how to integrated NIF and PROV-O in 
> particular for localsiation processing chains, see:
>
> http://www.w3.org/International/its/wiki/Provenance_Best_Practice
>
> This is just a rough example, but we are going to be updating this and 
> better documentation shortly. We already have an implementation 
> running successfully named CMS-LION.
>
>  I'd be keen to collaborate more on working out in more detail how nif 
> and prov-o could be combined.
>
> Cheers,
> Dave
>
> On 30 May 2013, at 11:24, Sebastian Hellmann 
> <hellmann at informatik.uni-leipzig.de 
> <mailto:hellmann at informatik.uni-leipzig.de>> wrote:
>
>> Hi Steve,
>>> Thanks Felix, is there a difference though between making an 
>>> assertion about the document and making one about the string that 
>>> results from pre-processing the document? 
>>
>> documents are really tricky technical as well as philosophical 
>> (abstract identity, ship of Theseus). From the top of my head I 
>> couldn't even define what "document" means exactly.
>>
>> Basically you can never be certain what hides behind a document URL. 
>> Here are some examples:
>> 1. non-information resources: http://dbpedia.org/resource/London
>> 2. A multilingual CMS normally implements a fallback mechanism to 
>> English , if a translated page is missing in e.g. German. So while 
>> the language of the document would be German, the content would be 
>> English.
>> 3. http://www.w3.org/DesignIssues/LinkedData.html has been edited 
>> several time, last on 2009. To what does the URI refer to in this 
>> case? All versions or the latest?
>>
>> For NLP we should only use the document URL for info not concerning 
>> the content. This makes everything much easier and interoperable.
>>
>>> It's perhaps the difference between "this document has 300 words" 
>>> and "when I process this document like this it has 300 words". 
>> That is one major difficulty for interoperability. The latter one is 
>> reproducible. nif:Context is a more granular modeling as it points to 
>> the text. It doesn't really matter, whether it is a document or not 
>> (e.g. sentence or paragraph).  So you can actually model paragraphs 
>> the same way.
>>
>>
>>> I guess the question is for a processing component that wants to 
>>> make an assertion in its output about the document as a whole so 
>>> that a subsequent step can use it.  Should it use the input document 
>>> URI or make an assertion about the character range that it used to 
>>> represent the document internally.  Given that the character range 
>>> might be different between different components, it would seem 
>>> useful to have a way of making assertions about the whole document 
>>> that didn't depend on how it was pre-processed.
>>>
>>>     Can you give a triple and a sparql query that only works if we
>>>     drop #char=0,29 from the URI?
>>>
>>> Well, it would be the result of two components making assertions 
>>> about different character ranges each believing that it is making an 
>>> assertion about the whole document.
>>
>> Wouldn't his be a client issue on how to merge this. How is this 
>> handled traditionally? nif:sourceUrl is currently still unstable and 
>> underspecified.
>> Maybe we can find a use case for this issue and then decide.
>> For the ITS use case this is completely irrelevant. Because NIF is 
>> only used in the Web service conversion scenario. That is  ITS in 
>> HTML -> Text (or NIF) -> NLP webservice -> NIF output -> merge with 
>> ITS in HTML.
>>
>> There are several options (maybe more):
>> 1. use a special identifier such as  #char=0,  to denote the whole 
>> character range. This merges everything automatically then.
>> 2.  the client can merge annotations by copying them:
>> construct {
>>     <newUri#char=x,x> ?p ?o
>> } where {
>>     ?context ?p ?o .
>>     ?context nif:sourceUrl <document>.
>> }
>>
>> Could you elaborate what kind of annotations you are referring to ?
>> Using the document URI makes sense for certain annotations (e.g. 
>> dc:publisher). For others not (e.g. nif:count).
>> All the best,
>> Sebastian
>>
>> Am 30.05.2013 09:01, schrieb Steve Cassidy:
>>> On 30 May 2013 16:39, Felix Sasaki <fsasaki at w3.org 
>>> <mailto:fsasaki at w3.org>> wrote:
>>>
>>>     Well, do avoid the problem you need two pieces of information:
>>>     - document URI independent of complete character range
>>>     - document URI + complete character range
>>>     http://example.com/exampledoc.html#=char=0,29 gives you both,
>>>     and the ability to distinguish between different calculations of
>>>     complete character ranges.
>>>
>>>
>>> <http://example.com/exampledoc.html#=char=0,29> xx:wordcount 5 .
>>> <http://example.com/exampledoc.htm 
>>> <http://example.com/exampledoc.html#=char=0,29>l> xx:wordcount 5 .
>>>
>>> These are two separate statements and not related unless we say
>>>
>>> <http://example.com/exampledoc.htm 
>>> <http://example.com/exampledoc.html#=char=0,29>l>
>>>         xx:full_character_range 
>>> <http://example.com/exampledoc.html#=char=0,29> .
>>>
>>> which of course you could assert.
>>>
>>> I guess the question is for a processing component that wants to 
>>> make an assertion in its output about the document as a whole so 
>>> that a subsequent step can use it.  Should it use the input document 
>>> URI or make an assertion about the character range that it used to 
>>> represent the document internally.  Given that the character range 
>>> might be different between different components, it would seem 
>>> useful to have a way of making assertions about the whole document 
>>> that didn't depend on how it was pre-processed.
>>>
>>>     Can you give a triple and a sparql query that only works if we
>>>     drop #=char=0,29 from the URI?
>>>
>>> Well, it would be the result of two components making assertions 
>>> about different character ranges each believing that it is making an 
>>> assertion about the whole document.
>>>
>>> Steve
>>>
>>> -- 
>>> Department of Computing, Macquarie University
>>> http://web.science.mq.edu.au/~cassidy/ 
>>> <http://web.science.mq.edu.au/%7Ecassidy/>
>>
>>
>> -- 
>> Dipl. Inf. Sebastian Hellmann
>> Department of Computer Science, University of Leipzig
>> Events: NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org, 
>> Deadline: *July 8th*)
>> Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
>> Projects: http://nlp2rdf.org , http://linguistics.okfn.org , 
>> http://dbpedia.org/Wiktionary , http://dbpedia.org
>> Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
>> Research Group: http://aksw.org
>> _______________________________________________
>> NLP2RDF mailing list
>> NLP2RDF at lists.informatik.uni-leipzig.de 
>> <mailto:NLP2RDF at lists.informatik.uni-leipzig.de>
>> http://lists.informatik.uni-leipzig.de/mailman/listinfo/nlp2rdf


-- 
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Events: NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org, 
Deadline: *July 8th*)
Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
Projects: http://nlp2rdf.org , http://linguistics.okfn.org , 
http://dbpedia.org/Wiktionary , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.informatik.uni-leipzig.de/pipermail/nlp2rdf/attachments/20130531/f629fba6/attachment-0001.html>


More information about the NLP2RDF mailing list