[NLP2RDF] document and corpus level aggregates

Thu May 30 09:42:32 CEST 2013

Am 30.05.13 09:01, schrieb Steve Cassidy:
> On 30 May 2013 16:39, Felix Sasaki <fsasaki at w3.org 
> <mailto:fsasaki at w3.org>> wrote:
>
>     Well, do avoid the problem you need two pieces of information:
>     - document URI independent of complete character range
>     - document URI + complete character range
>     http://example.com/exampledoc.html#=char=0,29 gives you both, and
>     the ability to distinguish between different calculations of
>     complete character ranges.
>
>
> <http://example.com/exampledoc.html#=char=0,29> xx:wordcount 5 .
> <http://example.com/exampledoc.htm 
> <http://example.com/exampledoc.html#=char=0,29>l> xx:wordcount 5 .
>
> These are two separate statements and not related unless we say
>
> <http://example.com/exampledoc.htm 
> <http://example.com/exampledoc.html#=char=0,29>l>
>         xx:full_character_range 
> <http://example.com/exampledoc.html#=char=0,29> .
>
> which of course you could assert.
>
> I guess the question is for a processing component that wants to make 
> an assertion in its output about the document as a whole so that a 
> subsequent step can use it.  Should it use the input document URI or 
> make an assertion about the character range that it used to represent 
> the document internally.  Given that the character range might be 
> different between different components, it would seem useful to have a 
> way of making assertions about the whole document that didn't depend 
> on how it was pre-processed.

I think you have the pre-processing information via 
nif:wasConvertedFrom, see
http://www.w3.org/International/multilingualweb/lt/drafts/its20/examples/nif/EX-nif-conversion-output.xml
and a URI like
http://example.com/exampledoc.html#xpath(/html/body[1]/h2[1]/b[1])
gives you the source of the NIF data before the pre-processing. It is 
defined in
http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/version-1.0/nif-core.ttl
as a sub property of prov:wasDerivedFrom.

Best,

Felix

>
>     Can you give a triple and a sparql query that only works if we
>     drop #=char=0,29 from the URI?
>
> Well, it would be the result of two components making assertions about 
> different character ranges each believing that it is making an 
> assertion about the whole document.
>
> Steve
>
> -- 
> Department of Computing, Macquarie University
> http://web.science.mq.edu.au/~cassidy/ 
> <http://web.science.mq.edu.au/%7Ecassidy/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.informatik.uni-leipzig.de/pipermail/nlp2rdf/attachments/20130530/5488df0d/attachment-0001.html>