[NLP2RDF] document and corpus level aggregates

Sebastian Hellmann hellmann at informatik.uni-leipzig.de
Wed May 29 09:38:10 CEST 2013


Hi all,
as you might now, we have been talking in small groups about the 
individual problems. Now it is time to take all issues to this list and 
tackle them one by one.
I choose this topic to start with, as it was relevant for Pablo and Max 
from Spotlight[1], Mladen and Uros for DS2[2] and LOD2 and Harald from 
Poolparty (SWC)[3], we might also reuse the lemon model. Don't worry we 
will consider all issues (e.g. HPSG raised by Michael or an algorithm 
for salience requested by Jean-Marc ) . I will collect them, make a list 
and then propose them for discussion. Of course you are welcome to post 
to this list and use GitHub for issue tracking and contributions (via 
pull request): The ontologies and examples are here:
https://github.com/NLP2RDF/persistence.uni-leipzig.org

The basic request was to include things such as term count and tf-idf 
measures, as well as probabilities which statistics about which surface 
forms are used for which entities (for Spotlight).

The basic unit in NIF is the nif:Context, so the document-level is 
covered, when the string in a nif:Context equals the content of a document.
I made a draft (example) for a nif:ContextCollection, which is an 
arbitrary unordered grouping of several contexts (which coincide with 
Wikipedia articles).

The example:
*************
Two documents in Wikisyntax:
http://persistence.uni-leipzig.org/nlp2rdf/examples/wikilex/Alcoholism.wiki
http://persistence.uni-leipzig.org/nlp2rdf/examples/wikilex/Arachnophobia.wiki
get converted to text:
http://persistence.uni-leipzig.org/nlp2rdf/examples/wikilex/Alcoholism.txt
http://persistence.uni-leipzig.org/nlp2rdf/examples/wikilex/Arachnophobia.txt

During this process Wikilinks to Benzodiazepine are removed and added as 
annotations:

Alcohol.wiki: [[Benzodiazepines]], while useful in the management of acute alcohol withdrawal
Arachnophobia.wiki: In addition [[beta blockers]], [[serotonin reuptake inhibitors]] and [[benzodiazepines|sedatives]] are used in the treatment of phobias.



<Alcoholism.txt#char=37028,37043>
	a  nif:RFC5147String ;
	nif:beginIndex "37028" ;
	nif:endIndex "37043" ;
	itsrdf:taIdentRef <http://dbpedia.org/resource/Benzodiazepine> ;
	nif:referenceContext <Alcoholism.txt#char=0,91429>  .




Term are:

sedatives and Benzodiazepines


The whole example can be found here:
http://persistence.uni-leipzig.org/nlp2rdf/examples/wikilex/contextcollection
The Spotlight model can be found at the end.

All the best,
Sebastian



[1] http://wiki.dbpedia.org/Datasets/NLP
[2] http://static.lod2.eu/Deliverables/D3.6-Final.pdf
[3] http://www.poolparty.biz/

-- 
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Events: NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org, 
Deadline: *July 8th*)
Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
Projects: http://nlp2rdf.org , http://linguistics.okfn.org , 
http://dbpedia.org/Wiktionary , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org


More information about the NLP2RDF mailing list