[NLP2RDF] KB versioning problem of text annotations

Thu Sep 24 12:02:16 CEST 2015

Dear all,

I would like to know, how you think about the handling of annotation
versions.

problem description:
For tasks like entity extraction, entity recognition and entity
linking/disambiguation there are datasets containing documents and entities
annotated inside those documents. Creating such gold standard datasets is
very time consuming since humans have to annotate the documents using the
URIs of a given knowledge base (KB), e.g., DBpedia.
However, if such a dataset is created it is based on a certain version of
the KB. Over the time, the URIs of some of the entities inside the dataset
might change or entities, that have not been part of the KB before are
added to it. Thus, the dataset might be outdated.
For our own dataset [1] we added the itsrdf:taSource property to the
annotations containing the version of the KB. For the GERBIL project [2] we
would like to adapt this behaviour to other datasets, too. Unfortunately,
it seems that there are no established URIs for a certain version of the
DBpedia. For our dataset we used a simple String but it would be much
better if we could use an already established set of URIs for different KB
versions.

Do you think using the property itsrdf:taSource is a good solution to
handle the versioning problem of annotations?
Are there already established URIs for different versions of well known KBs?

Cheers
Michael

[1] https://github.com/AKSW/n3-collection
[2] http://aksw.org/Projects/GERBIL.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.informatik.uni-leipzig.de/pipermail/nlp2rdf/attachments/20150924/f4e4bdf0/attachment.html>