<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <div class="moz-cite-prefix">Hi Steve,<br>

      <blockquote type="cite">Thanks Felix, is there a difference though

        between making an assertion about the document and making one

        about the string that results from pre-processing the document?&nbsp;</blockquote>

      <br>

      documents are really tricky technical as well as philosophical

      (abstract identity, ship of Theseus). From the top of my head I

      couldn't even define what "document" means exactly. <br>

      <br>

      Basically you can never be certain what hides behind a document

      URL. Here are some examples:<br>

      1. non-information resources: <a class="moz-txt-link-freetext" href="http://dbpedia.org/resource/London">http://dbpedia.org/resource/London</a><br>

      2. A multilingual CMS normally implements a fallback mechanism to

      English , if a translated page is missing in e.g. German. So while

      the language of the document would be German, the content would be

      English. <br>

      3. <a class="moz-txt-link-freetext" href="http://www.w3.org/DesignIssues/LinkedData.html">http://www.w3.org/DesignIssues/LinkedData.html</a> has been edited

      several time, last on 2009. To what does the URI refer to in this

      case? All versions or the latest?<br>

      <br>

      For NLP we should only use the document URL for info not

      concerning the content. This makes everything much easier and

      interoperable. <br>

      <br>

      <blockquote type="cite">It's perhaps the difference between "this

        document has 300 words" and "when I process this document like

        this it has 300 words".&nbsp;</blockquote>

      That is one major difficulty for interoperability. The latter one

      is reproducible. nif:Context is a more granular modeling as it

      points to the text. It doesn't really matter, whether it is a

      document or not (e.g. sentence or paragraph).&nbsp; So you can actually

      model paragraphs the same way. <br>

      <br>

      <br>

      <blockquote type="cite">

        <div style="">I guess the question is for a processing component

          that wants to make an assertion in its output about the

          document as a whole so that a subsequent step can use it.

          &nbsp;Should it use the input document URI or make an assertion

          about the character range that it used to represent the

          document internally. &nbsp;Given that the character range might be

          different between different components, it would seem useful

          to have a way of making assertions about the whole document

          that didn't depend on how it was pre-processed.</div>

        <div style=""><br>

        </div>

        <blockquote class="gmail_quote" style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

          <div bgcolor="#FFFFFF" text="#000000"> Can you give a triple

            and a sparql query that only works if we drop #char=0,29

            from the URI?<br>

            <div class="im"><br>

            </div>

          </div>

        </blockquote>

        <div style="">Well, it would be the result of two components

          making assertions about different character ranges each

          believing that it is making an assertion about the whole

          document.</div>

      </blockquote>

      <br>

      Wouldn't his be a client issue on how to merge this. How is this

      handled traditionally? nif:sourceUrl is currently still unstable

      and underspecified. <br>

      Maybe we can find a use case for this issue and then decide. <br>

      For the ITS use case this is completely irrelevant. Because NIF is

      only used in the Web service conversion scenario. That is&nbsp; ITS in

      HTML -&gt; Text (or NIF) -&gt; NLP webservice -&gt; NIF output

      -&gt; merge with ITS in HTML. <br>

      <br>

      There are several options (maybe more):<br>

      1. use a special identifier such as&nbsp; #char=0,&nbsp; to denote the whole

      character range. This merges everything automatically then. <br>

      2.&nbsp; the client can merge annotations by copying them: <br>

      construct {<br>

      &nbsp;&nbsp;&nbsp; &lt;newUri#char=x,x&gt; ?p ?o <br>

      } where { <br>

      &nbsp;&nbsp;&nbsp; ?context ?p ?o .<br>

      &nbsp;&nbsp;&nbsp; ?context nif:sourceUrl &lt;document&gt;. <br>

      }<br>

      <br>

      Could you elaborate what kind of annotations you are referring to

      ?<br>

      Using the document URI makes sense for certain annotations (e.g.

      dc:publisher). For others not (e.g. nif:count). <br>

      All the best,<br>

      Sebastian<br>

      <br>

      Am 30.05.2013 09:01, schrieb Steve Cassidy:<br>

    </div>

    <blockquote

cite="mid:CADg8aoinp3bPqSPK=hkNwG0NHpK_b+R7Ec5L2oiVAjgkQ-SVrg@mail.gmail.com"

      type="cite">

      <div dir="ltr">On 30 May 2013 16:39, Felix Sasaki <span dir="ltr">&lt;<a

            moz-do-not-send="true" href="mailto:fsasaki@w3.org"

            target="_blank">fsasaki@w3.org</a>&gt;</span> wrote:

        <div><br>

          <div class="gmail_extra">

            <div class="gmail_quote">

              <blockquote class="gmail_quote" style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

                <div bgcolor="#FFFFFF" text="#000000"> Well, do avoid

                  the problem you need two pieces of information:<br>

                  - document URI independent of complete character range<br>

                  - document URI + complete character range <br>

                  <a moz-do-not-send="true"

                    href="http://example.com/exampledoc.html#=char=0,29"

                    target="_blank">http://example.com/exampledoc.html#=char=0,29</a>

                  gives you both, and the ability to distinguish between

                  different calculations of complete character ranges.<br>

                </div>

              </blockquote>

              <div style=""><br>

              </div>

              <div>&lt;<a moz-do-not-send="true"

                  href="http://example.com/exampledoc.html#=char=0,29"

                  target="_blank">http://example.com/exampledoc.html#=char=0,29</a>&gt;

                xx:wordcount 5 .</div>

              <div>&lt;<a moz-do-not-send="true"

                  href="http://example.com/exampledoc.html#=char=0,29"

                  target="_blank">http://example.com/exampledoc.htm</a>l&gt;

                xx:wordcount 5 .<br>

              </div>

              <div><br>

              </div>

              <div style="">These are two separate statements and not

                related unless we say</div>

              <div style=""><br class="">

                &lt;<a moz-do-not-send="true"

                  href="http://example.com/exampledoc.html#=char=0,29"

                  target="_blank">http://example.com/exampledoc.htm</a>l&gt;&nbsp;</div>

              <div style="">&nbsp; &nbsp; &nbsp; &nbsp; xx:full_character_range &lt;<a

                  moz-do-not-send="true"

                  href="http://example.com/exampledoc.html#=char=0,29"

                  target="_blank">http://example.com/exampledoc.html#=char=0,29</a>&gt;

                .</div>

              <div style=""><br>

              </div>

              <div style="">which of course you could assert. &nbsp;</div>

              <div style=""><br>

              </div>

              <div style="">I guess the question is for a processing

                component that wants to make an assertion in its output

                about the document as a whole so that a subsequent step

                can use it. &nbsp;Should it use the input document URI or

                make an assertion about the character range that it used

                to represent the document internally. &nbsp;Given that the

                character range might be different between different

                components, it would seem useful to have a way of making

                assertions about the whole document that didn't depend

                on how it was pre-processed.</div>

              <div style=""><br>

              </div>

              <blockquote class="gmail_quote" style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

                <div bgcolor="#FFFFFF" text="#000000"> Can you give a

                  triple and a sparql query that only works if we drop

                  #=char=0,29 from the URI?<br>

                  <div class="im"><br>

                  </div>

                </div>

              </blockquote>

              <div style="">Well, it would be the result of two

                components making assertions about different character

                ranges each believing that it is making an assertion

                about the whole document.</div>

              <div style=""><br>

              </div>

              <div style="">Steve</div>

              <div style=""><br>

              </div>

            </div>

            -- <br>

            Department of Computing, Macquarie University

            <div><a moz-do-not-send="true"

                href="http://web.science.mq.edu.au/%7Ecassidy/"

                target="_blank">http://web.science.mq.edu.au/~cassidy/</a></div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    <br>

    <div class="moz-signature">-- <br>

      Dipl. Inf. Sebastian Hellmann<br>

      Department of Computer Science, University of Leipzig <br>

      Events: NLP &amp; DBpedia 2013

      (<a class="moz-txt-link-freetext" href="http://nlp-dbpedia2013.blogs.aksw.org">http://nlp-dbpedia2013.blogs.aksw.org</a>, Deadline: *July 8th*)<br>

      Venha para a Alemanha como PhD:

      <a class="moz-txt-link-freetext" href="http://bis.informatik.uni-leipzig.de/csf">http://bis.informatik.uni-leipzig.de/csf</a><br>

      Projects: <a class="moz-txt-link-freetext" href="http://nlp2rdf.org">http://nlp2rdf.org</a> , <a class="moz-txt-link-freetext" href="http://linguistics.okfn.org">http://linguistics.okfn.org</a> ,

      <a class="moz-txt-link-freetext" href="http://dbpedia.org/Wiktionary">http://dbpedia.org/Wiktionary</a> , <a class="moz-txt-link-freetext" href="http://dbpedia.org">http://dbpedia.org</a><br>

      Homepage: <a class="moz-txt-link-freetext" href="http://bis.informatik.uni-leipzig.de/SebastianHellmann">http://bis.informatik.uni-leipzig.de/SebastianHellmann</a><br>

      Research Group: <a class="moz-txt-link-freetext" href="http://aksw.org">http://aksw.org</a><br>

    </div>

  </body>

</html>