[NLP2RDF] document and corpus level aggregates
    Sebastian Hellmann 
    hellmann at informatik.uni-leipzig.de
       
    Wed May 29 09:38:10 CEST 2013
    
    
  
Hi all,
as you might now, we have been talking in small groups about the 
individual problems. Now it is time to take all issues to this list and 
tackle them one by one.
I choose this topic to start with, as it was relevant for Pablo and Max 
from Spotlight[1], Mladen and Uros for DS2[2] and LOD2 and Harald from 
Poolparty (SWC)[3], we might also reuse the lemon model. Don't worry we 
will consider all issues (e.g. HPSG raised by Michael or an algorithm 
for salience requested by Jean-Marc ) . I will collect them, make a list 
and then propose them for discussion. Of course you are welcome to post 
to this list and use GitHub for issue tracking and contributions (via 
pull request): The ontologies and examples are here:
https://github.com/NLP2RDF/persistence.uni-leipzig.org
The basic request was to include things such as term count and tf-idf 
measures, as well as probabilities which statistics about which surface 
forms are used for which entities (for Spotlight).
The basic unit in NIF is the nif:Context, so the document-level is 
covered, when the string in a nif:Context equals the content of a document.
I made a draft (example) for a nif:ContextCollection, which is an 
arbitrary unordered grouping of several contexts (which coincide with 
Wikipedia articles).
The example:
*************
Two documents in Wikisyntax:
http://persistence.uni-leipzig.org/nlp2rdf/examples/wikilex/Alcoholism.wiki
http://persistence.uni-leipzig.org/nlp2rdf/examples/wikilex/Arachnophobia.wiki
get converted to text:
http://persistence.uni-leipzig.org/nlp2rdf/examples/wikilex/Alcoholism.txt
http://persistence.uni-leipzig.org/nlp2rdf/examples/wikilex/Arachnophobia.txt
During this process Wikilinks to Benzodiazepine are removed and added as 
annotations:
Alcohol.wiki: [[Benzodiazepines]], while useful in the management of acute alcohol withdrawal
Arachnophobia.wiki: In addition [[beta blockers]], [[serotonin reuptake inhibitors]] and [[benzodiazepines|sedatives]] are used in the treatment of phobias.
<Alcoholism.txt#char=37028,37043>
	a  nif:RFC5147String ;
	nif:beginIndex "37028" ;
	nif:endIndex "37043" ;
	itsrdf:taIdentRef <http://dbpedia.org/resource/Benzodiazepine> ;
	nif:referenceContext <Alcoholism.txt#char=0,91429>  .
Term are:
sedatives and Benzodiazepines
The whole example can be found here:
http://persistence.uni-leipzig.org/nlp2rdf/examples/wikilex/contextcollection
The Spotlight model can be found at the end.
All the best,
Sebastian
[1] http://wiki.dbpedia.org/Datasets/NLP
[2] http://static.lod2.eu/Deliverables/D3.6-Final.pdf
[3] http://www.poolparty.biz/
-- 
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Events: NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org, 
Deadline: *July 8th*)
Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
Projects: http://nlp2rdf.org , http://linguistics.okfn.org , 
http://dbpedia.org/Wiktionary , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org
    
    
More information about the NLP2RDF
mailing list