<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">Am 30.05.13 08:24, schrieb Steve
      Cassidy:<br>
    </div>
    <blockquote
cite="mid:CADg8aoiqbzUMaBWrG_Gi-B1jYY1R3VqeSa2Kebuy+P7qOTR1wg@mail.gmail.com"
      type="cite">
      <div dir="ltr">Thanks Felix, is there a difference though between
        making an assertion about the document and making one about the
        string that results from pre-processing the document? <br>
      </div>
    </blockquote>
    <br>
    The difference will be in the subject URIs: different tools might do
    different preprocessing, leading to different subject URIs in the
    asserations: e.g. in<br>
    <br>
<a class="moz-txt-link-freetext" href="http://www.w3.org/International/multilingualweb/lt/drafts/its20/examples/nif/EX-nif-conversion-output.xml">http://www.w3.org/International/multilingualweb/lt/drafts/its20/examples/nif/EX-nif-conversion-output.xml</a><br>
    you have as reference context<br>
    <a class="moz-txt-link-freetext" href="http://example.com/exampledoc.html#char=0,29">http://example.com/exampledoc.html#char=0,29</a><br>
    but you might have<br>
    <a class="moz-txt-link-freetext" href="http://example.com/exampledoc.html#char=0,30">http://example.com/exampledoc.html#char=0,30</a><br>
    When processing NIF representations processed via different
    extraction chains e.g. in SPARQL queries the difference between 29
    and 30 matters.<br>
    <br>
    Best,<br>
    <br>
    Felix<br>
    <br>
    <blockquote
cite="mid:CADg8aoiqbzUMaBWrG_Gi-B1jYY1R3VqeSa2Kebuy+P7qOTR1wg@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div><br>
        </div>
        <div style="">It's probably not an important point but it seems
          odd to me to qualify it in this way.</div>
        <div style=""><br>
        </div>
        <div style="">Steve</div>
      </div>
      <div class="gmail_extra"><br>
        <br>
        <div class="gmail_quote">On 30 May 2013 16:14, Felix Sasaki <span
            dir="ltr">&lt;<a moz-do-not-send="true"
              href="mailto:fsasaki@w3.org" target="_blank">fsasaki@w3.org</a>&gt;</span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div bgcolor="#FFFFFF" text="#000000">
              <div>Am 30.05.13 08:07, schrieb Steve Cassidy:<br>
              </div>
              <div class="im">
                <blockquote type="cite">
                  <div dir="ltr"><br>
                    <div class="gmail_extra">
                      <div class="gmail_quote">
                        <blockquote class="gmail_quote"
                          style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><br>
                          The basic unit in NIF is the nif:Context, so
                          the document-level is covered, when the string
                          in a nif:Context equals the content of a
                          document.&nbsp;</blockquote>
                        <blockquote class="gmail_quote"
                          style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">...<br>
                          &lt;Alcoholism.txt#char=37028,37043&gt;<br>
                          &nbsp; &nbsp; &nbsp; &nbsp; a &nbsp;nif:RFC5147String ;<br>
                          &nbsp; &nbsp; &nbsp; &nbsp; nif:beginIndex "37028" ;<br>
                          &nbsp; &nbsp; &nbsp; &nbsp; nif:endIndex "37043" ;<br>
                          &nbsp; &nbsp; &nbsp; &nbsp; itsrdf:taIdentRef &lt;<a
                            moz-do-not-send="true"
                            href="http://dbpedia.org/resource/Benzodiazepine"
                            target="_blank">http://dbpedia.org/resource/Benzodiazepine</a>&gt;

                          ;<br>
                          &nbsp; &nbsp; &nbsp; &nbsp; nif:referenceContext
                          &lt;Alcoholism.txt#char=0,91429&gt; &nbsp;.<br>
                        </blockquote>
                        <div><br>
                        </div>
                        <div>Just wondering why you don't use
                          &lt;Alcoholism.txt&gt; when making assertions
                          about the document as a whole rather than
                          giving the entire character range as a
                          qualifier. </div>
                      </div>
                    </div>
                  </div>
                </blockquote>
                <br>
              </div>
              Hi Steve,<br>
              <br>
              Sebastian may have a different answer, but here is my view
              from how this is used in ITS 2.0: when you convert a&nbsp;
              document like<br>
              <a moz-do-not-send="true"
href="http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#EX-HTML-whitespace-normalization"
                target="_blank">http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#EX-HTML-whitespace-normalization</a><br>
              to NIF, you will make a lot of decisions what to drop
              (white space nodes, content of HTML "head" or "script"
              inside "body") and how to segment (e.g. not extract
              content of "span" separately but rather as part of "p").
              nif:referenceContext gives you together with nif:isString
              clear information what the extracted complete string is.<br>
              <br>
              Best,<br>
              <br>
              Felix<br>
              <br>
              <blockquote type="cite">
                <div class="im">
                  <div dir="ltr">
                    <div class="gmail_extra">
                      <div class="gmail_quote">
                        <div>&nbsp;Presumably the same assertion would be
                          true of &lt;Alcoholism.txt#char=0,91427&gt;
                          &nbsp;too but if you are trying to encode document
                          level meta-data and you have an identifier for
                          the document, why not use it?&nbsp;</div>
                        <div><br>
                        </div>
                        <div>Steve</div>
                        <div>&nbsp;</div>
                      </div>
                      -- <br>
                      Department of Computing, Macquarie University
                      <div><a moz-do-not-send="true"
                          href="http://web.science.mq.edu.au/%7Ecassidy/"
                          target="_blank">http://web.science.mq.edu.au/~cassidy/</a></div>
                    </div>
                  </div>
                  <br>
                  <fieldset></fieldset>
                  <br>
                </div>
                <div class="im">
                  <pre>_______________________________________________
NLP2RDF mailing list
<a moz-do-not-send="true" href="mailto:NLP2RDF@lists.informatik.uni-leipzig.de" target="_blank">NLP2RDF@lists.informatik.uni-leipzig.de</a>
<a moz-do-not-send="true" href="http://lists.informatik.uni-leipzig.de/mailman/listinfo/nlp2rdf" target="_blank">http://lists.informatik.uni-leipzig.de/mailman/listinfo/nlp2rdf</a>
</pre>
                </div>
              </blockquote>
              <br>
            </div>
          </blockquote>
        </div>
        <br>
        <br clear="all">
        <div><br>
        </div>
        -- <br>
        Department of Computing, Macquarie University
        <div><a moz-do-not-send="true"
            href="http://web.science.mq.edu.au/%7Ecassidy/"
            target="_blank">http://web.science.mq.edu.au/~cassidy/</a></div>
      </div>
    </blockquote>
    <br>
  </body>
</html>