<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Hi Steve,<br>
<blockquote type="cite">Thanks Felix, is there a difference though
between making an assertion about the document and making one
about the string that results from pre-processing the document? </blockquote>
<br>
documents are really tricky technical as well as philosophical
(abstract identity, ship of Theseus). From the top of my head I
couldn't even define what "document" means exactly. <br>
<br>
Basically you can never be certain what hides behind a document
URL. Here are some examples:<br>
1. non-information resources: <a class="moz-txt-link-freetext" href="http://dbpedia.org/resource/London">http://dbpedia.org/resource/London</a><br>
2. A multilingual CMS normally implements a fallback mechanism to
English , if a translated page is missing in e.g. German. So while
the language of the document would be German, the content would be
English. <br>
3. <a class="moz-txt-link-freetext" href="http://www.w3.org/DesignIssues/LinkedData.html">http://www.w3.org/DesignIssues/LinkedData.html</a> has been edited
several time, last on 2009. To what does the URI refer to in this
case? All versions or the latest?<br>
<br>
For NLP we should only use the document URL for info not
concerning the content. This makes everything much easier and
interoperable. <br>
<br>
<blockquote type="cite">It's perhaps the difference between "this
document has 300 words" and "when I process this document like
this it has 300 words". </blockquote>
That is one major difficulty for interoperability. The latter one
is reproducible. nif:Context is a more granular modeling as it
points to the text. It doesn't really matter, whether it is a
document or not (e.g. sentence or paragraph). So you can actually
model paragraphs the same way. <br>
<br>
<br>
<blockquote type="cite">
<div style="">I guess the question is for a processing component
that wants to make an assertion in its output about the
document as a whole so that a subsequent step can use it.
Should it use the input document URI or make an assertion
about the character range that it used to represent the
document internally. Given that the character range might be
different between different components, it would seem useful
to have a way of making assertions about the whole document
that didn't depend on how it was pre-processed.</div>
<div style=""><br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> Can you give a triple
and a sparql query that only works if we drop #char=0,29
from the URI?<br>
<div class="im"><br>
</div>
</div>
</blockquote>
<div style="">Well, it would be the result of two components
making assertions about different character ranges each
believing that it is making an assertion about the whole
document.</div>
</blockquote>
<br>
Wouldn't his be a client issue on how to merge this. How is this
handled traditionally? nif:sourceUrl is currently still unstable
and underspecified. <br>
Maybe we can find a use case for this issue and then decide. <br>
For the ITS use case this is completely irrelevant. Because NIF is
only used in the Web service conversion scenario. That is ITS in
HTML -> Text (or NIF) -> NLP webservice -> NIF output
-> merge with ITS in HTML. <br>
<br>
There are several options (maybe more):<br>
1. use a special identifier such as #char=0, to denote the whole
character range. This merges everything automatically then. <br>
2. the client can merge annotations by copying them: <br>
construct {<br>
<newUri#char=x,x> ?p ?o <br>
} where { <br>
?context ?p ?o .<br>
?context nif:sourceUrl <document>. <br>
}<br>
<br>
Could you elaborate what kind of annotations you are referring to
?<br>
Using the document URI makes sense for certain annotations (e.g.
dc:publisher). For others not (e.g. nif:count). <br>
All the best,<br>
Sebastian<br>
<br>
Am 30.05.2013 09:01, schrieb Steve Cassidy:<br>
</div>
<blockquote
cite="mid:CADg8aoinp3bPqSPK=hkNwG0NHpK_b+R7Ec5L2oiVAjgkQ-SVrg@mail.gmail.com"
type="cite">
<div dir="ltr">On 30 May 2013 16:39, Felix Sasaki <span dir="ltr"><<a
moz-do-not-send="true" href="mailto:fsasaki@w3.org"
target="_blank">fsasaki@w3.org</a>></span> wrote:
<div><br>
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> Well, do avoid
the problem you need two pieces of information:<br>
- document URI independent of complete character range<br>
- document URI + complete character range <br>
<a moz-do-not-send="true"
href="http://example.com/exampledoc.html#=char=0,29"
target="_blank">http://example.com/exampledoc.html#=char=0,29</a>
gives you both, and the ability to distinguish between
different calculations of complete character ranges.<br>
</div>
</blockquote>
<div style=""><br>
</div>
<div><<a moz-do-not-send="true"
href="http://example.com/exampledoc.html#=char=0,29"
target="_blank">http://example.com/exampledoc.html#=char=0,29</a>>
xx:wordcount 5 .</div>
<div><<a moz-do-not-send="true"
href="http://example.com/exampledoc.html#=char=0,29"
target="_blank">http://example.com/exampledoc.htm</a>l>
xx:wordcount 5 .<br>
</div>
<div><br>
</div>
<div style="">These are two separate statements and not
related unless we say</div>
<div style=""><br class="">
<<a moz-do-not-send="true"
href="http://example.com/exampledoc.html#=char=0,29"
target="_blank">http://example.com/exampledoc.htm</a>l> </div>
<div style=""> xx:full_character_range <<a
moz-do-not-send="true"
href="http://example.com/exampledoc.html#=char=0,29"
target="_blank">http://example.com/exampledoc.html#=char=0,29</a>>
.</div>
<div style=""><br>
</div>
<div style="">which of course you could assert. </div>
<div style=""><br>
</div>
<div style="">I guess the question is for a processing
component that wants to make an assertion in its output
about the document as a whole so that a subsequent step
can use it. Should it use the input document URI or
make an assertion about the character range that it used
to represent the document internally. Given that the
character range might be different between different
components, it would seem useful to have a way of making
assertions about the whole document that didn't depend
on how it was pre-processed.</div>
<div style=""><br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> Can you give a
triple and a sparql query that only works if we drop
#=char=0,29 from the URI?<br>
<div class="im"><br>
</div>
</div>
</blockquote>
<div style="">Well, it would be the result of two
components making assertions about different character
ranges each believing that it is making an assertion
about the whole document.</div>
<div style=""><br>
</div>
<div style="">Steve</div>
<div style=""><br>
</div>
</div>
-- <br>
Department of Computing, Macquarie University
<div><a moz-do-not-send="true"
href="http://web.science.mq.edu.au/%7Ecassidy/"
target="_blank">http://web.science.mq.edu.au/~cassidy/</a></div>
</div>
</div>
</div>
</blockquote>
<br>
<br>
<div class="moz-signature">-- <br>
Dipl. Inf. Sebastian Hellmann<br>
Department of Computer Science, University of Leipzig <br>
Events: NLP & DBpedia 2013
(<a class="moz-txt-link-freetext" href="http://nlp-dbpedia2013.blogs.aksw.org">http://nlp-dbpedia2013.blogs.aksw.org</a>, Deadline: *July 8th*)<br>
Venha para a Alemanha como PhD:
<a class="moz-txt-link-freetext" href="http://bis.informatik.uni-leipzig.de/csf">http://bis.informatik.uni-leipzig.de/csf</a><br>
Projects: <a class="moz-txt-link-freetext" href="http://nlp2rdf.org">http://nlp2rdf.org</a> , <a class="moz-txt-link-freetext" href="http://linguistics.okfn.org">http://linguistics.okfn.org</a> ,
<a class="moz-txt-link-freetext" href="http://dbpedia.org/Wiktionary">http://dbpedia.org/Wiktionary</a> , <a class="moz-txt-link-freetext" href="http://dbpedia.org">http://dbpedia.org</a><br>
Homepage: <a class="moz-txt-link-freetext" href="http://bis.informatik.uni-leipzig.de/SebastianHellmann">http://bis.informatik.uni-leipzig.de/SebastianHellmann</a><br>
Research Group: <a class="moz-txt-link-freetext" href="http://aksw.org">http://aksw.org</a><br>
</div>
</body>
</html>