<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Am 30.05.13 08:34, schrieb Steve
Cassidy:<br>
</div>
<blockquote
cite="mid:CADg8aoh+S13GPEf=8_X-DXCdvnOaXjzVE+Ffy23OO0suJn2_ow@mail.gmail.com"
type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> The difference will
be in the subject URIs: different tools might do
different preprocessing, leading to different subject
URIs in the asserations: e.g. in<br>
<br>
<a moz-do-not-send="true"
href="http://www.w3.org/International/multilingualweb/lt/drafts/its20/examples/nif/EX-nif-conversion-output.xml"
target="_blank">http://www.w3.org/International/multilingualweb/lt/drafts/its20/examples/nif/EX-nif-conversion-output.xml</a><br>
you have as reference context<br>
<a moz-do-not-send="true"
href="http://example.com/exampledoc.html#char=0,29"
target="_blank">http://example.com/exampledoc.html#char=0,29</a><br>
but you might have<br>
<a moz-do-not-send="true"
href="http://example.com/exampledoc.html#char=0,30"
target="_blank">http://example.com/exampledoc.html#char=0,30</a><br>
When processing NIF representations processed via
different extraction chains e.g. in SPARQL queries the
difference between 29 and 30 matters.<br>
</div>
</blockquote>
<div><br>
</div>
<div style="">Exactly, so if the _intention_ is to make an
assertion about the document, then <a
moz-do-not-send="true"
href="http://example.com/exampledoc.html">http://example.com/exampledoc.html</a>
would be a more appropriate subject URI. If the intention
is to make an assertion about the result of processing
that document then the char range is appropriate. </div>
<div style=""><br>
</div>
<div style="">It's perhaps the difference between "this
document has 300 words" and "when I process this document
like this it has 300 words". </div>
<div style=""><br>
</div>
<div style="">
The problem might come as you say when we try to aggregate
results from different chains each of which intended to
make assertions about the document as a whole but used
different pre-processing giving different offsets. <br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
Well, do avoid the problem you need two pieces of information:<br>
- document URI independent of complete character range<br>
- document URI + complete character range <br>
<a class="moz-txt-link-freetext" href="http://example.com/exampledoc.html#=char=0,29">http://example.com/exampledoc.html#=char=0,29</a> gives you both, and
the ability to distinguish between different calculations of
complete character ranges.<br>
<br>
Can you give a triple and a sparql query that only works if we drop
#=char=0,29 from the URI?<br>
<br>
Best,<br>
<br>
Felix<br>
<blockquote
cite="mid:CADg8aoh+S13GPEf=8_X-DXCdvnOaXjzVE+Ffy23OO0suJn2_ow@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div style=""><br>
</div>
<div style="">Steve</div>
</div>
-- <br>
Department of Computing, Macquarie University
<div><a moz-do-not-send="true"
href="http://web.science.mq.edu.au/%7Ecassidy/"
target="_blank">http://web.science.mq.edu.au/~cassidy/</a></div>
</div>
</div>
</blockquote>
<br>
</body>
</html>