[NLP2RDF] [Corpora-List] Announcement: NLP Interchange Format (NIF)

John F. Sowa sowa at bestweb.net
Fri Dec 9 17:39:19 CET 2011


Dear Michael and Jens,

JFS
>> I sent a note to Ontolog Forum (copy below), which addresses
>> many of the points raised in this thread.

MB
> Which would have been a better place for you to start the thread.

The talk by R. V. Guha, who was the original designer of RDF, was
sponsored by Ontolog Forum last week.  It started a thread on that
list.  When NLP2RDF was announced on Corpora List, I thought it was
appropriate to alert the developers and potential users of NIF about
that talk and its implications.

MB
> schema.org is part of RDF: http://schema.org/docs/datamodel.html
>
> "The data model used is very generic and derived from RDF Schema"

That quotation is taken out of context.  See the full statement:

schema.org
> The data model used is very generic and derived from RDF Schema.
> (which in turn was derived from CycL, which in turn ...).

CycL is the very rich logic of the Cyc system, which Guha had helped
design and implement while he was an associate director of Cyc.  The
three dots refer to the many developments in AI, logic, comp. sci.,
linguistics, and NLP that influenced Cyc.  In designing RDF, Guha
tried to design a very limited, simple notation based on just binary
relations (which C. S. Peirce introduced in 1870).  He hoped that
could be a starting point, which would evolve into the much richer
logic that was necessary for AI, NLP, comp. sci., and linguistics.

But as he said in his talk, "Somehow RDF never caught on."  He did not
mean that nobody uses it, but that it failed to achieve the widespread
use that the W3C had hoped for.  In response to a question about using
LISP (which I asked), Guha said "I wish we could have done that."

Most of the other people who had any experience in AI also wished
that they could have used LISP.  That includes Ora Lassila, who wrote
a proposal in 1997 for a LISP-like version, and Pat Hayes, who defined
the LBase semantics with Guha.  Pat was also a coauthor of another web
page you cited: http://www.w3.org/TR/rdf-mt/  Hayes & Menzel extended
LBase for the semantics of ISO standard 24707 for Common Logic (CL).

MS
> Nobody said that RDF is bound to RDFs and OWL/DL. If you think that
> many people would sacrifice decidability and low computational
> complexity for more expressional power, just define your own semantic
> extension. You can have unrestricted first order logic - LBase
> is just that.

The WHERE-clause of SQL has the full expressive power of first-order
logic for expressing queries and constraints.  And that version of logic
runs the world economy.  One of the major reasons why "RDF never caught
on" for commercial web sites is that nearly all of them are built around
a relational database.  The limited expressive power of RDF and OWL is
one of the major deterrents to using it for commercial web sites.

As for NLP, every major notation for syntax or semantics requires at
least full FOL for its definition and/or for interchanging the results
of analyzing and interpreting NL sentences.  If you have any questions
about decidability, I recommend the following article:

    http://www.jfsowa.com/pubs/fflogic.pdf
    Fads and Fallacies about Logic

JL
> RDFa is used for embedding RDF in HTML pages. Hence, it is quite obvious
> that it is a better choice for schema.org than other RDF syntaxes. There
> are, of course, other scenarios in which you just want to exchange
> information (without HTML), in which one of the other RDF serialisations
> is more appropriate.

After schema.org was introduced, the RDF community responded with its
own web site that recommended ways of using RDF in conjunction with it.
See http://schema.rdfs.org

The first page of that web site presents a serialization of the
hierarchy of terms and definitions from schema.org.  It has links
to five different representations:  JSON (which Google and other
participants in schema.org recommend), CSV (Comma Separated Values),
and three serializations for RDF:  RDF/Turtle, RDF/XML, and RDF/N3.

Before making a firm commitment to any notation as a standard for NLP,
I suggest that you poll computational linguists and ask them what they
would prefer for their work.  Among the questions you could ask is to
look at those five serializations and check which one(s) they prefer.

Corpora List is a good place to start such a poll.

John


More information about the NLP2RDF mailing list