[NLP2RDF] document and corpus level aggregates

Felix Sasaki fsasaki at w3.org
Thu May 30 08:39:01 CEST 2013


Am 30.05.13 08:34, schrieb Steve Cassidy:
>
>
>     The difference will be in the subject URIs: different tools might
>     do different preprocessing, leading to different subject URIs in
>     the asserations: e.g. in
>
>     http://www.w3.org/International/multilingualweb/lt/drafts/its20/examples/nif/EX-nif-conversion-output.xml
>     you have as reference context
>     http://example.com/exampledoc.html#char=0,29
>     but you might have
>     http://example.com/exampledoc.html#char=0,30
>     When processing NIF representations processed via different
>     extraction chains e.g. in SPARQL queries the difference between 29
>     and 30 matters.
>
>
> Exactly, so if the _intention_ is to make an assertion about the 
> document, then http://example.com/exampledoc.html would be a more 
> appropriate subject URI. If the intention is to make an assertion 
> about the result of processing that document then the char range is 
> appropriate.
>
> It's perhaps the difference between "this document has 300 words" and 
> "when I process this document like this it has 300 words".
>
> The problem might come as you say when we try to aggregate results 
> from different chains each of which intended to make assertions about 
> the document as a whole but used different pre-processing giving 
> different offsets.

Well, do avoid the problem you need two pieces of information:
- document URI independent of complete character range
- document URI + complete character range
http://example.com/exampledoc.html#=char=0,29 gives you both, and the 
ability to distinguish between different calculations of complete 
character ranges.

Can you give a triple and a sparql query that only works if we drop 
#=char=0,29 from the URI?

Best,

Felix
>
> Steve
> -- 
> Department of Computing, Macquarie University
> http://web.science.mq.edu.au/~cassidy/ 
> <http://web.science.mq.edu.au/%7Ecassidy/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.informatik.uni-leipzig.de/pipermail/nlp2rdf/attachments/20130530/f3bb6840/attachment.html>


More information about the NLP2RDF mailing list