[NLP2RDF] document and corpus level aggregates
Sebastian Hellmann
hellmann at informatik.uni-leipzig.de
Wed May 29 09:38:10 CEST 2013
Hi all,
as you might now, we have been talking in small groups about the
individual problems. Now it is time to take all issues to this list and
tackle them one by one.
I choose this topic to start with, as it was relevant for Pablo and Max
from Spotlight[1], Mladen and Uros for DS2[2] and LOD2 and Harald from
Poolparty (SWC)[3], we might also reuse the lemon model. Don't worry we
will consider all issues (e.g. HPSG raised by Michael or an algorithm
for salience requested by Jean-Marc ) . I will collect them, make a list
and then propose them for discussion. Of course you are welcome to post
to this list and use GitHub for issue tracking and contributions (via
pull request): The ontologies and examples are here:
https://github.com/NLP2RDF/persistence.uni-leipzig.org
The basic request was to include things such as term count and tf-idf
measures, as well as probabilities which statistics about which surface
forms are used for which entities (for Spotlight).
The basic unit in NIF is the nif:Context, so the document-level is
covered, when the string in a nif:Context equals the content of a document.
I made a draft (example) for a nif:ContextCollection, which is an
arbitrary unordered grouping of several contexts (which coincide with
Wikipedia articles).
The example:
*************
Two documents in Wikisyntax:
http://persistence.uni-leipzig.org/nlp2rdf/examples/wikilex/Alcoholism.wiki
http://persistence.uni-leipzig.org/nlp2rdf/examples/wikilex/Arachnophobia.wiki
get converted to text:
http://persistence.uni-leipzig.org/nlp2rdf/examples/wikilex/Alcoholism.txt
http://persistence.uni-leipzig.org/nlp2rdf/examples/wikilex/Arachnophobia.txt
During this process Wikilinks to Benzodiazepine are removed and added as
annotations:
Alcohol.wiki: [[Benzodiazepines]], while useful in the management of acute alcohol withdrawal
Arachnophobia.wiki: In addition [[beta blockers]], [[serotonin reuptake inhibitors]] and [[benzodiazepines|sedatives]] are used in the treatment of phobias.
<Alcoholism.txt#char=37028,37043>
a nif:RFC5147String ;
nif:beginIndex "37028" ;
nif:endIndex "37043" ;
itsrdf:taIdentRef <http://dbpedia.org/resource/Benzodiazepine> ;
nif:referenceContext <Alcoholism.txt#char=0,91429> .
Term are:
sedatives and Benzodiazepines
The whole example can be found here:
http://persistence.uni-leipzig.org/nlp2rdf/examples/wikilex/contextcollection
The Spotlight model can be found at the end.
All the best,
Sebastian
[1] http://wiki.dbpedia.org/Datasets/NLP
[2] http://static.lod2.eu/Deliverables/D3.6-Final.pdf
[3] http://www.poolparty.biz/
--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Events: NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org,
Deadline: *July 8th*)
Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
http://dbpedia.org/Wiktionary , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org
More information about the NLP2RDF
mailing list