<div dir="ltr">Dear Martin, dear all,<div><br></div><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

while converting the TIGER corpus and its dependency trees to NIF, I bumped against some limitations of the NIF ontology:<br>

<br>

At the moment, the only dedicated property for dependency structures is &quot;nif:dependency&quot;, pointing from the head to the dependant. However, I think an inverse property to that would also be nice to have. So I propose &quot;nif:phraseHead&quot;, pointing in the other direction.<br>

</blockquote><div><br></div><div>And actually, that would me more conformant to quasi-standardized formats such as CoNLL that points from dependant to head. In CoNLL, this is a technical artifact, though: in a tree, the head is unique for the dependent, but not vice versa, and thus, the tabular CoNLL format requires exactly two columns rather than an indefinite number. </div>

<div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">A completely missing property is &quot;nif:dependencyRelationType&quot;, annotating the type of the dependency relation as a literal, just like &quot;nif:posTag&quot; does for POS tags.<br>

</blockquote><div><br></div><div>Can you provide a more elaborate example? Do you mean to annotate dependency relations at dependents? This may not be unambiguous. For schemes that allow multiple heads/parents for the same dependant/child (e.g., TIGER, SALSA), it needs to be annotated to the property itself. To provide a link with OLiA and stay within OWL2/DL, POWLA used reification to represent labelled edges. As I understand NIF, however, reification is currently not recommended, as for its original use case as interchange format in NLP pipelines, this creates too much overhead.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

It would also be nice to have some property that annotates the root node of a sentence that could be used to traverse the dependency tree of the sentence just like &quot;nif:firstWord&quot; and &quot;nif:nextWord&quot; enable traversing surface structure of the sentence.<br>

</blockquote><div><br></div><div>In POWLA, we had such a root feature, as it was intended for corpus querying rather than merely representing NLP output. For querying, this is quite essential. But again, it increases the overhead, though. The original idea we had with Sebastian when he started his work on NIF was to develop a division of labour between POWLA and NIF, with NIF being an interchange format as minimal (and as compact) as possible, and POWLA a formalism ready for OWL2/DL-supported corpus querying. For different reasons (new positions, new obligations, etc.), working out the mapping between both stalled at some point, but if there is interest from the community to work more into the corpus direction with NIF, I would support taking this endeavour up again.</div>

<div><br></div><div>In any case, one should carefully distinguish two different use cases: annotation exchange and corpus querying, as they have diametrically opposed requirements in terms of expressivity and formality (not so much on the annotation exchange, more on the querying part) and compactness (vice versa). Pushing both into the same formalism would be a compromise optimal for neither application. Unless a usecase emerges in which both requirements meet (and I don&#39;t see any), developing two, bidirectionally mappable, dialects of the same formalism would bve preferrable. A reasonable mapping should be possible in linear time, so that from a computational perspective, it would not add substantial overhead to a pipeline in which on-the-fly syntax annotation (hence, polynomial, at least) is involved.</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">On a different node, the OLiA tags may need some changes:<br>


<br>

At the moment, there is nif:oliaLink and nif:oliaCategory used to link annotated words to respective OLiA resources. However, these resources can either be mophological or syntactic annotations. The properties themselves don&#39;t make it sufficiently clear if the oliaLink is used to link to a POS tag category or a syntactic category, like &quot;NounPhrase&quot;. I think this is semantically ambigious. If OLiA is used for different classes of annotation, the properties should reflect this. So the tags should rather be &quot;nif:oliaPosLink&quot; and &quot;nif:oliaSyntaxLink&quot; or something like that.<br>

</blockquote><div><br></div><div>From the OLiA perspective, this is a non-issue, as the Reference Model remains agnostic about the annotation layer (pos, syntax, whatever) an annotation comes from. This is because different schemes follow quite different strategies to distribute morphological, syntactic or semantic information across different annotation layers (e.g., *semantic* properties such as being a locative adverb may be encoded on the POS level [Susanne corpus], on the dependency level [Stanford deps], edge labels in a constituency tree [TIGER], node labels in a constituency tree [Penn Historical Corpora] and of course on NER or SRL levels).</div>

<div><br></div><div>Concepts in an OLiA Annotation Model are intrinsically tied to an annotation layer, though. This is what the hasTier property is intended for. It is not widely used, though, and may require reassessment, because real-life tier (annotation level) identifiers are variable rather than constants. In layer-based annotation tools such as ELAN or EXMARaLDA, tier ids can be freely defined, and this is used, for example, for dialog annotation. Then, there would be multiple pos layers, for example, and from the perspective of the annotation model, we have no idea whether these are called &quot;STTS1..n&quot; (for the German standard POS tagset) , &quot;POS1..n&quot; or &quot;Wortarten1..n&quot; (for German POS) or whatever.</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Another point in question is that NIF is rather dependent on OLiA categories. Now some tagsets used to annotate corpora are not mapped by OLiA. Users might also not agree with the OLiA categories themselves and might like to define own categories. There is no way to support such additions. Of course we could speak to Christian Chiarcos about additions to OLiA, but I don&#39;t know how open he will be to collaborative additions and changes to his model. My vague proposal (just an idea at this point) would be:<br>

</blockquote><div><br></div><div>I&#39;m open to additions and discussions about modifications. Modifications in OLiA should be monotone (until a version change there should be no deletions, merely deprecation), but possible, and don&#39;t depend on me as a person, but on the collaborators on sourceforge (please don&#39;t hesitate to contact me if you want to contribute).</div>

<div><br></div><div>In any case, OLiA encourages linking with &quot;external reference models&quot; precisely in the way that Martin suggests. However, unless there is a strong call for it from the community, I would advise against building yet another ontology at the moment, as there are plenty around (ISOcat, GOLD, TDS, quite a few project-specific ones). Instead, one may register novel categories in ISOcat (<a href="http://www.isocat.org/">http://www.isocat.org/</a>) and include them in the OLiA-ISOcat linking via rdfs:subPropertyOf and rdfs:subClassOf.</div>

<div><br></div><div>ISOcat provides definitions, URIs, and (optionally) hierarchical relations between concepts, all of which can be exported to RDF. It is, however, not an ontology, but a semistructured, and extendable, list of data categories, so an ontology defining relations between (established or newly created) ISOcat categories may be provided in addition. As I understood, this was the idea of the RELcat addition to ISOcat. In this way, user-specific semantics can be added to ISOcat URIs. A sample ISOcat ontology for the morphosyntactic profile can be found in the &quot;experimental&quot; branch of the OLiA sourceforge repository together with the (experimental) linking. If there is demand from the community, I polish them up and integrate them into the stable dump.</div>

<div> </div><div>All the best,</div><div>Christian</div></div></div></div>