[NLP2RDF] document and corpus level aggregates
Felix Sasaki
fsasaki at w3.org
Thu May 30 09:42:32 CEST 2013
Am 30.05.13 09:01, schrieb Steve Cassidy:
> On 30 May 2013 16:39, Felix Sasaki <fsasaki at w3.org
> <mailto:fsasaki at w3.org>> wrote:
>
> Well, do avoid the problem you need two pieces of information:
> - document URI independent of complete character range
> - document URI + complete character range
> http://example.com/exampledoc.html#=char=0,29 gives you both, and
> the ability to distinguish between different calculations of
> complete character ranges.
>
>
> <http://example.com/exampledoc.html#=char=0,29> xx:wordcount 5 .
> <http://example.com/exampledoc.htm
> <http://example.com/exampledoc.html#=char=0,29>l> xx:wordcount 5 .
>
> These are two separate statements and not related unless we say
>
> <http://example.com/exampledoc.htm
> <http://example.com/exampledoc.html#=char=0,29>l>
> xx:full_character_range
> <http://example.com/exampledoc.html#=char=0,29> .
>
> which of course you could assert.
>
> I guess the question is for a processing component that wants to make
> an assertion in its output about the document as a whole so that a
> subsequent step can use it. Should it use the input document URI or
> make an assertion about the character range that it used to represent
> the document internally. Given that the character range might be
> different between different components, it would seem useful to have a
> way of making assertions about the whole document that didn't depend
> on how it was pre-processed.
I think you have the pre-processing information via
nif:wasConvertedFrom, see
http://www.w3.org/International/multilingualweb/lt/drafts/its20/examples/nif/EX-nif-conversion-output.xml
and a URI like
http://example.com/exampledoc.html#xpath(/html/body[1]/h2[1]/b[1])
gives you the source of the NIF data before the pre-processing. It is
defined in
http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/version-1.0/nif-core.ttl
as a sub property of prov:wasDerivedFrom.
Best,
Felix
>
> Can you give a triple and a sparql query that only works if we
> drop #=char=0,29 from the URI?
>
> Well, it would be the result of two components making assertions about
> different character ranges each believing that it is making an
> assertion about the whole document.
>
> Steve
>
> --
> Department of Computing, Macquarie University
> http://web.science.mq.edu.au/~cassidy/
> <http://web.science.mq.edu.au/%7Ecassidy/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.informatik.uni-leipzig.de/pipermail/nlp2rdf/attachments/20130530/5488df0d/attachment-0001.html>
More information about the NLP2RDF
mailing list