<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Am 30.05.13 08:24, schrieb Steve
Cassidy:<br>
</div>
<blockquote
cite="mid:CADg8aoiqbzUMaBWrG_Gi-B1jYY1R3VqeSa2Kebuy+P7qOTR1wg@mail.gmail.com"
type="cite">
<div dir="ltr">Thanks Felix, is there a difference though between
making an assertion about the document and making one about the
string that results from pre-processing the document? <br>
</div>
</blockquote>
<br>
The difference will be in the subject URIs: different tools might do
different preprocessing, leading to different subject URIs in the
asserations: e.g. in<br>
<br>
<a class="moz-txt-link-freetext" href="http://www.w3.org/International/multilingualweb/lt/drafts/its20/examples/nif/EX-nif-conversion-output.xml">http://www.w3.org/International/multilingualweb/lt/drafts/its20/examples/nif/EX-nif-conversion-output.xml</a><br>
you have as reference context<br>
<a class="moz-txt-link-freetext" href="http://example.com/exampledoc.html#char=0,29">http://example.com/exampledoc.html#char=0,29</a><br>
but you might have<br>
<a class="moz-txt-link-freetext" href="http://example.com/exampledoc.html#char=0,30">http://example.com/exampledoc.html#char=0,30</a><br>
When processing NIF representations processed via different
extraction chains e.g. in SPARQL queries the difference between 29
and 30 matters.<br>
<br>
Best,<br>
<br>
Felix<br>
<br>
<blockquote
cite="mid:CADg8aoiqbzUMaBWrG_Gi-B1jYY1R3VqeSa2Kebuy+P7qOTR1wg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div><br>
</div>
<div style="">It's probably not an important point but it seems
odd to me to qualify it in this way.</div>
<div style=""><br>
</div>
<div style="">Steve</div>
</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On 30 May 2013 16:14, Felix Sasaki <span
dir="ltr"><<a moz-do-not-send="true"
href="mailto:fsasaki@w3.org" target="_blank">fsasaki@w3.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>Am 30.05.13 08:07, schrieb Steve Cassidy:<br>
</div>
<div class="im">
<blockquote type="cite">
<div dir="ltr"><br>
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><br>
The basic unit in NIF is the nif:Context, so
the document-level is covered, when the string
in a nif:Context equals the content of a
document. </blockquote>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">...<br>
<Alcoholism.txt#char=37028,37043><br>
a nif:RFC5147String ;<br>
nif:beginIndex "37028" ;<br>
nif:endIndex "37043" ;<br>
itsrdf:taIdentRef <<a
moz-do-not-send="true"
href="http://dbpedia.org/resource/Benzodiazepine"
target="_blank">http://dbpedia.org/resource/Benzodiazepine</a>>
;<br>
nif:referenceContext
<Alcoholism.txt#char=0,91429> .<br>
</blockquote>
<div><br>
</div>
<div>Just wondering why you don't use
<Alcoholism.txt> when making assertions
about the document as a whole rather than
giving the entire character range as a
qualifier. </div>
</div>
</div>
</div>
</blockquote>
<br>
</div>
Hi Steve,<br>
<br>
Sebastian may have a different answer, but here is my view
from how this is used in ITS 2.0: when you convert a
document like<br>
<a moz-do-not-send="true"
href="http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#EX-HTML-whitespace-normalization"
target="_blank">http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#EX-HTML-whitespace-normalization</a><br>
to NIF, you will make a lot of decisions what to drop
(white space nodes, content of HTML "head" or "script"
inside "body") and how to segment (e.g. not extract
content of "span" separately but rather as part of "p").
nif:referenceContext gives you together with nif:isString
clear information what the extracted complete string is.<br>
<br>
Best,<br>
<br>
Felix<br>
<br>
<blockquote type="cite">
<div class="im">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> Presumably the same assertion would be
true of <Alcoholism.txt#char=0,91427>
too but if you are trying to encode document
level meta-data and you have an identifier for
the document, why not use it? </div>
<div><br>
</div>
<div>Steve</div>
<div> </div>
</div>
-- <br>
Department of Computing, Macquarie University
<div><a moz-do-not-send="true"
href="http://web.science.mq.edu.au/%7Ecassidy/"
target="_blank">http://web.science.mq.edu.au/~cassidy/</a></div>
</div>
</div>
<br>
<fieldset></fieldset>
<br>
</div>
<div class="im">
<pre>_______________________________________________
NLP2RDF mailing list
<a moz-do-not-send="true" href="mailto:NLP2RDF@lists.informatik.uni-leipzig.de" target="_blank">NLP2RDF@lists.informatik.uni-leipzig.de</a>
<a moz-do-not-send="true" href="http://lists.informatik.uni-leipzig.de/mailman/listinfo/nlp2rdf" target="_blank">http://lists.informatik.uni-leipzig.de/mailman/listinfo/nlp2rdf</a>
</pre>
</div>
</blockquote>
<br>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
Department of Computing, Macquarie University
<div><a moz-do-not-send="true"
href="http://web.science.mq.edu.au/%7Ecassidy/"
target="_blank">http://web.science.mq.edu.au/~cassidy/</a></div>
</div>
</blockquote>
<br>
</body>
</html>