[NLP2RDF] [Corpora-List] Announcement: NLP Interchange Format (NIF)

hellmann at informatik.uni-leipzig.de hellmann at informatik.uni-leipzig.de
Sat Dec 17 14:30:49 CET 2011


Dear John, dear lists,
There has been a detailed discussion on the ontolog forum.
I have read most of these emails, but the discussion became quite long  
and might bore any outsider. Here is an overview:
http://ontolog.cim3.net/forum/ontolog-forum/2011-12/threads.html

@John:
Sadly, it seems that there are no concrete improvements we can make,  
based on the comments you made. You did not provide a feasible  
alternative. Your suggestions seem to require years to be put into  
practice.

Especially, your basic assumption that NIF was created for computer  
linguists is slightly off. NIF was originally developed for LOD2 (  
http://lod2.eu ) and the Semantic Web community. We have undergone an  
initial community review process and consulted personally with over 40  
people. This is also the reason, why NIF is firmly built on W3C  
standards such as URIs, RDF, OWL and others.  Now after the 1.0  
release we are hoping for feedback from a broader audience and invite  
anybody to give feedback. We are, however, unable to leave this  
consistent framework, because we would loose one major feature of NIF:  
interoperability with Linked Data and the RDF world. The basic goal of  
NIF is to unlock NLP for the RDF world and connect the Web of Data,  
the Web of Documents and NLP.

Although, my colleagues and I are also big fans of JSON (btw. as  
others also mentioned, RDF and JSON is not a contradiction), it just  
won't happen that we will define our own semantics outside of RDF and  
OWL, because we would loose more than gain. Personally, I also think,  
that the time of not reusable island solutions should come to an end  
and we should start to build on open standards as we have done with NIF.

I was actually wondering why you got stuck on the serialization of  
NIF. It is one of the tiniest and uninteresting aspects, in my  
opinion. There is a plethora of open parser and serializer  
implementations available, so developers are relieved of a lot of  
boilerplate. Many other aspects of NIF are more interesting and could  
even be reused outside the RDF world. The design of the URIs for  
example is quite universal. Also the mappings and the knowledge  
contained in the provided ontologies can be converted easily to other  
formats.

Of course, we are aware that the provided flexibility and reusability  
adds some performance overhead. Personally, I would recommend to use  
technologies like UIMA and RDBMS for performance critical tasks. One  
use case of NIF is to be able to easily replace one web service (e.g.  
Zemanta or OpenCalais) with another one as the interface is  
standardized. Note that these services are available as web services  
already, so the extra parsing overhead might be neglectable. It would  
be interesting to have some measurement if the parsing speed has any  
relevance compared to network speed and latency in the real world.

We are not aware of any other data model, which we could have used.  
Topic Maps might have been an option, but the tool support is not as  
rich. JSON  and XML are not really data models, I would rather count  
them as serialization formats for data models.

Anyhow, I will have a look at the whole Ontolog discussion again and  
see if there is something concrete, we can exploit. I think the main  
difficulty for the adoption of RDF was that people tried to use it for  
tasks that it is not suited for (e.g. replacing relational databases).  
For linking and data integration, however, it seems to be working  
quite well, hence we used it for NIF.
Annotations are a form of linking, right?

All the best,
Sebastian



Quoting "John F. Sowa" <sowa at bestweb.net>:

> Dear Michael and Jens,
>
> JFS
>>> I sent a note to Ontolog Forum (copy below), which addresses
>>> many of the points raised in this thread.
>
> MB
>> Which would have been a better place for you to start the thread.
>
> The talk by R. V. Guha, who was the original designer of RDF, was
> sponsored by Ontolog Forum last week.  It started a thread on that
> list.  When NLP2RDF was announced on Corpora List, I thought it was
> appropriate to alert the developers and potential users of NIF about
> that talk and its implications.
>
> MB
>> schema.org is part of RDF: http://schema.org/docs/datamodel.html
>>
>> "The data model used is very generic and derived from RDF Schema"
>
> That quotation is taken out of context.  See the full statement:
>
> schema.org
>> The data model used is very generic and derived from RDF Schema.
>> (which in turn was derived from CycL, which in turn ...).
>
> CycL is the very rich logic of the Cyc system, which Guha had helped
> design and implement while he was an associate director of Cyc.  The
> three dots refer to the many developments in AI, logic, comp. sci.,
> linguistics, and NLP that influenced Cyc.  In designing RDF, Guha
> tried to design a very limited, simple notation based on just binary
> relations (which C. S. Peirce introduced in 1870).  He hoped that
> could be a starting point, which would evolve into the much richer
> logic that was necessary for AI, NLP, comp. sci., and linguistics.
>
> But as he said in his talk, "Somehow RDF never caught on."  He did not
> mean that nobody uses it, but that it failed to achieve the widespread
> use that the W3C had hoped for.  In response to a question about using
> LISP (which I asked), Guha said "I wish we could have done that."
>
> Most of the other people who had any experience in AI also wished
> that they could have used LISP.  That includes Ora Lassila, who wrote
> a proposal in 1997 for a LISP-like version, and Pat Hayes, who defined
> the LBase semantics with Guha.  Pat was also a coauthor of another web
> page you cited: http://www.w3.org/TR/rdf-mt/  Hayes & Menzel extended
> LBase for the semantics of ISO standard 24707 for Common Logic (CL).
>
> MS
>> Nobody said that RDF is bound to RDFs and OWL/DL. If you think that
>> many people would sacrifice decidability and low computational
>> complexity for more expressional power, just define your own semantic
>> extension. You can have unrestricted first order logic - LBase
>> is just that.
>
> The WHERE-clause of SQL has the full expressive power of first-order
> logic for expressing queries and constraints.  And that version of logic
> runs the world economy.  One of the major reasons why "RDF never caught
> on" for commercial web sites is that nearly all of them are built around
> a relational database.  The limited expressive power of RDF and OWL is
> one of the major deterrents to using it for commercial web sites.
>
> As for NLP, every major notation for syntax or semantics requires at
> least full FOL for its definition and/or for interchanging the results
> of analyzing and interpreting NL sentences.  If you have any questions
> about decidability, I recommend the following article:
>
>    http://www.jfsowa.com/pubs/fflogic.pdf
>    Fads and Fallacies about Logic
>
> JL
>> RDFa is used for embedding RDF in HTML pages. Hence, it is quite obvious
>> that it is a better choice for schema.org than other RDF syntaxes. There
>> are, of course, other scenarios in which you just want to exchange
>> information (without HTML), in which one of the other RDF serialisations
>> is more appropriate.
>
> After schema.org was introduced, the RDF community responded with its
> own web site that recommended ways of using RDF in conjunction with it.
> See http://schema.rdfs.org
>
> The first page of that web site presents a serialization of the
> hierarchy of terms and definitions from schema.org.  It has links
> to five different representations:  JSON (which Google and other
> participants in schema.org recommend), CSV (Comma Separated Values),
> and three serializations for RDF:  RDF/Turtle, RDF/XML, and RDF/N3.
>
> Before making a firm commitment to any notation as a standard for NLP,
> I suggest that you poll computational linguists and ask them what they
> would prefer for their work.  Among the questions you could ask is to
> look at those five serializations and check which one(s) they prefer.
>
> Corpora List is a good place to start such a poll.
>
> John
> _______________________________________________
> NLP2RDF mailing list
> NLP2RDF at lists.informatik.uni-leipzig.de
> http://lists.informatik.uni-leipzig.de/mailman/listinfo/nlp2rdf
>
>



----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.




More information about the NLP2RDF mailing list