[NLP2RDF] document and corpus level aggregates

Felix Sasaki fsasaki at w3.org
Thu May 30 08:14:42 CEST 2013


Am 30.05.13 08:07, schrieb Steve Cassidy:
>
>
>     The basic unit in NIF is the nif:Context, so the document-level is
>     covered, when the string in a nif:Context equals the content of a
>     document. 
>
>     ...
>     <Alcoholism.txt#char=37028,37043>
>             a  nif:RFC5147String ;
>             nif:beginIndex "37028" ;
>             nif:endIndex "37043" ;
>             itsrdf:taIdentRef
>     <http://dbpedia.org/resource/Benzodiazepine> ;
>             nif:referenceContext <Alcoholism.txt#char=0,91429>  .
>
>
> Just wondering why you don't use <Alcoholism.txt> when making 
> assertions about the document as a whole rather than giving the entire 
> character range as a qualifier.

Hi Steve,

Sebastian may have a different answer, but here is my view from how this 
is used in ITS 2.0: when you convert a  document like
http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#EX-HTML-whitespace-normalization
to NIF, you will make a lot of decisions what to drop (white space 
nodes, content of HTML "head" or "script" inside "body") and how to 
segment (e.g. not extract content of "span" separately but rather as 
part of "p"). nif:referenceContext gives you together with nif:isString 
clear information what the extracted complete string is.

Best,

Felix

>  Presumably the same assertion would be true of 
> <Alcoholism.txt#char=0,91427>  too but if you are trying to encode 
> document level meta-data and you have an identifier for the document, 
> why not use it?
>
> Steve
> -- 
> Department of Computing, Macquarie University
> http://web.science.mq.edu.au/~cassidy/ 
> <http://web.science.mq.edu.au/%7Ecassidy/>
>
>
> _______________________________________________
> NLP2RDF mailing list
> NLP2RDF at lists.informatik.uni-leipzig.de
> http://lists.informatik.uni-leipzig.de/mailman/listinfo/nlp2rdf

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.informatik.uni-leipzig.de/pipermail/nlp2rdf/attachments/20130530/78d40c84/attachment.html>


More information about the NLP2RDF mailing list