[NLP2RDF] document and corpus level aggregates
Felix Sasaki
fsasaki at w3.org
Thu May 30 08:39:01 CEST 2013
Am 30.05.13 08:34, schrieb Steve Cassidy:
>
>
> The difference will be in the subject URIs: different tools might
> do different preprocessing, leading to different subject URIs in
> the asserations: e.g. in
>
> http://www.w3.org/International/multilingualweb/lt/drafts/its20/examples/nif/EX-nif-conversion-output.xml
> you have as reference context
> http://example.com/exampledoc.html#char=0,29
> but you might have
> http://example.com/exampledoc.html#char=0,30
> When processing NIF representations processed via different
> extraction chains e.g. in SPARQL queries the difference between 29
> and 30 matters.
>
>
> Exactly, so if the _intention_ is to make an assertion about the
> document, then http://example.com/exampledoc.html would be a more
> appropriate subject URI. If the intention is to make an assertion
> about the result of processing that document then the char range is
> appropriate.
>
> It's perhaps the difference between "this document has 300 words" and
> "when I process this document like this it has 300 words".
>
> The problem might come as you say when we try to aggregate results
> from different chains each of which intended to make assertions about
> the document as a whole but used different pre-processing giving
> different offsets.
Well, do avoid the problem you need two pieces of information:
- document URI independent of complete character range
- document URI + complete character range
http://example.com/exampledoc.html#=char=0,29 gives you both, and the
ability to distinguish between different calculations of complete
character ranges.
Can you give a triple and a sparql query that only works if we drop
#=char=0,29 from the URI?
Best,
Felix
>
> Steve
> --
> Department of Computing, Macquarie University
> http://web.science.mq.edu.au/~cassidy/
> <http://web.science.mq.edu.au/%7Ecassidy/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.informatik.uni-leipzig.de/pipermail/nlp2rdf/attachments/20130530/f3bb6840/attachment.html>
More information about the NLP2RDF
mailing list