<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Am 30.05.13 08:07, schrieb Steve
Cassidy:<br>
</div>
<blockquote
cite="mid:CADg8aoiuc00hmyOK=v2YENFbwuF-458E_ETgZ=5rM7p8R9PP7Q@mail.gmail.com"
type="cite">
<div dir="ltr"><br>
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><br>
The basic unit in NIF is the nif:Context, so the
document-level is covered, when the string in a
nif:Context equals the content of a document. </blockquote>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">...<br>
<Alcoholism.txt#char=37028,37043><br>
a nif:RFC5147String ;<br>
nif:beginIndex "37028" ;<br>
nif:endIndex "37043" ;<br>
itsrdf:taIdentRef <<a moz-do-not-send="true"
href="http://dbpedia.org/resource/Benzodiazepine"
target="_blank">http://dbpedia.org/resource/Benzodiazepine</a>>
;<br>
nif:referenceContext
<Alcoholism.txt#char=0,91429> .<br>
</blockquote>
<div><br>
</div>
<div style="">Just wondering why you don't use
<Alcoholism.txt> when making assertions about the
document as a whole rather than giving the entire
character range as a qualifier. </div>
</div>
</div>
</div>
</blockquote>
<br>
Hi Steve,<br>
<br>
Sebastian may have a different answer, but here is my view from how
this is used in ITS 2.0: when you convert a document like<br>
<a class="moz-txt-link-freetext" href="http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#EX-HTML-whitespace-normalization">http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#EX-HTML-whitespace-normalization</a><br>
to NIF, you will make a lot of decisions what to drop (white space
nodes, content of HTML "head" or "script" inside "body") and how to
segment (e.g. not extract content of "span" separately but rather as
part of "p"). nif:referenceContext gives you together with
nif:isString clear information what the extracted complete string
is.<br>
<br>
Best,<br>
<br>
Felix<br>
<br>
<blockquote
cite="mid:CADg8aoiuc00hmyOK=v2YENFbwuF-458E_ETgZ=5rM7p8R9PP7Q@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div style=""> Presumably the same assertion would be true
of <Alcoholism.txt#char=0,91427> too but if you are
trying to encode document level meta-data and you have an
identifier for the document, why not use it? </div>
<div style=""><br>
</div>
<div style="">Steve</div>
<div> </div>
</div>
-- <br>
Department of Computing, Macquarie University
<div><a moz-do-not-send="true"
href="http://web.science.mq.edu.au/%7Ecassidy/"
target="_blank">http://web.science.mq.edu.au/~cassidy/</a></div>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
NLP2RDF mailing list
<a class="moz-txt-link-abbreviated" href="mailto:NLP2RDF@lists.informatik.uni-leipzig.de">NLP2RDF@lists.informatik.uni-leipzig.de</a>
<a class="moz-txt-link-freetext" href="http://lists.informatik.uni-leipzig.de/mailman/listinfo/nlp2rdf">http://lists.informatik.uni-leipzig.de/mailman/listinfo/nlp2rdf</a>
</pre>
</blockquote>
<br>
</body>
</html>