<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">Am 30.05.13 08:24, schrieb Steve

      Cassidy:<br>

    </div>

    <blockquote

cite="mid:CADg8aoiqbzUMaBWrG_Gi-B1jYY1R3VqeSa2Kebuy+P7qOTR1wg@mail.gmail.com"

      type="cite">

      <div dir="ltr">Thanks Felix, is there a difference though between

        making an assertion about the document and making one about the

        string that results from pre-processing the document? <br>

      </div>

    </blockquote>

    <br>

    The difference will be in the subject URIs: different tools might do

    different preprocessing, leading to different subject URIs in the

    asserations: e.g. in<br>

    <br>

<a class="moz-txt-link-freetext" href="http://www.w3.org/International/multilingualweb/lt/drafts/its20/examples/nif/EX-nif-conversion-output.xml">http://www.w3.org/International/multilingualweb/lt/drafts/its20/examples/nif/EX-nif-conversion-output.xml</a><br>

    you have as reference context<br>

    <a class="moz-txt-link-freetext" href="http://example.com/exampledoc.html#char=0,29">http://example.com/exampledoc.html#char=0,29</a><br>

    but you might have<br>

    <a class="moz-txt-link-freetext" href="http://example.com/exampledoc.html#char=0,30">http://example.com/exampledoc.html#char=0,30</a><br>

    When processing NIF representations processed via different

    extraction chains e.g. in SPARQL queries the difference between 29

    and 30 matters.<br>

    <br>

    Best,<br>

    <br>

    Felix<br>

    <br>

    <blockquote

cite="mid:CADg8aoiqbzUMaBWrG_Gi-B1jYY1R3VqeSa2Kebuy+P7qOTR1wg@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div><br>

        </div>

        <div style="">It's probably not an important point but it seems

          odd to me to qualify it in this way.</div>

        <div style=""><br>

        </div>

        <div style="">Steve</div>

      </div>

      <div class="gmail_extra"><br>

        <br>

        <div class="gmail_quote">On 30 May 2013 16:14, Felix Sasaki <span

            dir="ltr">&lt;<a moz-do-not-send="true"

              href="mailto:fsasaki@w3.org" target="_blank">fsasaki@w3.org</a>&gt;</span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div bgcolor="#FFFFFF" text="#000000">

              <div>Am 30.05.13 08:07, schrieb Steve Cassidy:<br>

              </div>

              <div class="im">

                <blockquote type="cite">

                  <div dir="ltr"><br>

                    <div class="gmail_extra">

                      <div class="gmail_quote">

                        <blockquote class="gmail_quote"

                          style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><br>

                          The basic unit in NIF is the nif:Context, so

                          the document-level is covered, when the string

                          in a nif:Context equals the content of a

                          document.&nbsp;</blockquote>

                        <blockquote class="gmail_quote"

                          style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">...<br>

                          &lt;Alcoholism.txt#char=37028,37043&gt;<br>

                          &nbsp; &nbsp; &nbsp; &nbsp; a &nbsp;nif:RFC5147String ;<br>

                          &nbsp; &nbsp; &nbsp; &nbsp; nif:beginIndex "37028" ;<br>

                          &nbsp; &nbsp; &nbsp; &nbsp; nif:endIndex "37043" ;<br>

                          &nbsp; &nbsp; &nbsp; &nbsp; itsrdf:taIdentRef &lt;<a

                            moz-do-not-send="true"

                            href="http://dbpedia.org/resource/Benzodiazepine"

                            target="_blank">http://dbpedia.org/resource/Benzodiazepine</a>&gt;

                          ;<br>

                          &nbsp; &nbsp; &nbsp; &nbsp; nif:referenceContext

                          &lt;Alcoholism.txt#char=0,91429&gt; &nbsp;.<br>

                        </blockquote>

                        <div><br>

                        </div>

                        <div>Just wondering why you don't use

                          &lt;Alcoholism.txt&gt; when making assertions

                          about the document as a whole rather than

                          giving the entire character range as a

                          qualifier. </div>

                      </div>

                    </div>

                  </div>

                </blockquote>

                <br>

              </div>

              Hi Steve,<br>

              <br>

              Sebastian may have a different answer, but here is my view

              from how this is used in ITS 2.0: when you convert a&nbsp;

              document like<br>

              <a moz-do-not-send="true"

href="http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#EX-HTML-whitespace-normalization"

                target="_blank">http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#EX-HTML-whitespace-normalization</a><br>

              to NIF, you will make a lot of decisions what to drop

              (white space nodes, content of HTML "head" or "script"

              inside "body") and how to segment (e.g. not extract

              content of "span" separately but rather as part of "p").

              nif:referenceContext gives you together with nif:isString

              clear information what the extracted complete string is.<br>

              <br>

              Best,<br>

              <br>

              Felix<br>

              <br>

              <blockquote type="cite">

                <div class="im">

                  <div dir="ltr">

                    <div class="gmail_extra">

                      <div class="gmail_quote">

                        <div>&nbsp;Presumably the same assertion would be

                          true of &lt;Alcoholism.txt#char=0,91427&gt;

                          &nbsp;too but if you are trying to encode document

                          level meta-data and you have an identifier for

                          the document, why not use it?&nbsp;</div>

                        <div><br>

                        </div>

                        <div>Steve</div>

                        <div>&nbsp;</div>

                      </div>

                      -- <br>

                      Department of Computing, Macquarie University

                      <div><a moz-do-not-send="true"

                          href="http://web.science.mq.edu.au/%7Ecassidy/"

                          target="_blank">http://web.science.mq.edu.au/~cassidy/</a></div>

                    </div>

                  </div>

                  <br>

                  <fieldset></fieldset>

                  <br>

                </div>

                <div class="im">

                  <pre>_______________________________________________

NLP2RDF mailing list

<a moz-do-not-send="true" href="mailto:NLP2RDF@lists.informatik.uni-leipzig.de" target="_blank">NLP2RDF@lists.informatik.uni-leipzig.de</a>

<a moz-do-not-send="true" href="http://lists.informatik.uni-leipzig.de/mailman/listinfo/nlp2rdf" target="_blank">http://lists.informatik.uni-leipzig.de/mailman/listinfo/nlp2rdf</a>

</pre>

                </div>

              </blockquote>

              <br>

            </div>

          </blockquote>

        </div>

        <br>

        <br clear="all">

        <div><br>

        </div>

        -- <br>

        Department of Computing, Macquarie University

        <div><a moz-do-not-send="true"

            href="http://web.science.mq.edu.au/%7Ecassidy/"

            target="_blank">http://web.science.mq.edu.au/~cassidy/</a></div>

      </div>

    </blockquote>

    <br>

  </body>

</html>