[NLP2RDF] NIF 1.0 specification: Issue with Document class, sourceURL, sourceString properties

Stephane Fellah sfellah at smartrealm.com
Sun Dec 4 21:47:53 CET 2011


Hi,

The NIF 1.0 specification indicates:

----
The definition of Document in NIF is closely tied to the request
issued to annotate it. So each piece of text that is sent to a service
is treated as a document. This produces three additional triples per
Document (or request).

@prefix ld: <http://www.w3.org/DesignIssues/LinkedData.html#> .
@prefix str: <http://nlp2rdf.lod2.eu/schema/string/> .
ld:offset_0_25482_%3Chtml%20xmlns%3D%22http%3A%2F%2F rdf:type
str:OffsetBasedString .
ld:offset_0_25482_%3Chtml%20xmlns%3D%22http%3A%2F%2F rdf:type str:Document .
ld:offset_0_25482_%3Chtml%20xmlns%3D%22http%3A%2F%2F str:sourceUrl
<http://www.w3.org/DesignIssues/LinkedData.html>
----

In addition the specification says :

" In each returned NIF model there should be at least one uri that
relates to the document as a whole and either references the page with
the property str:sourceUrl or includes the whole text of the document
with str:sourceString"

The sourceString as a property of the String class would be very
verbose if you have many annotations on the same document. I suggest
that the Document be the domain of sourceUrl and sourceString. The
Document can have the same identifier than the sourceUrl
I also suggest that we introduce a property called "source" to relate
the String to its document.

The changes will look like this:

@prefix ld: <http://www.w3.org/DesignIssues/LinkedData.html#> .
@prefix str: <http://nlp2rdf.lod2.eu/schema/string/> .
ld:offset_0_25482_%3Chtml%20xmlns%3D%22http%3A%2F%2F rdf:type
str:OffsetBasedString .
ld:offset_0_25482_%3Chtml%20xmlns%3D%22http%3A%2F%2F rdf:type str:Document .
ld:offset_0_25482_%3Chtml%20xmlns%3D%22http%3A%2F%2F str:source
<http://www.w3.org/DesignIssues/LinkedData.html>

<http://www.w3.org/DesignIssues/LinkedData.html> rdf:type str:Document
<http://www.w3.org/DesignIssues/LinkedData.html> str:sourceUrl
<http://www.w3.org/DesignIssues/LinkedData.html>
<http://www.w3.org/DesignIssues/LinkedData.html> str:sourceString
"text of the document...."

Also shouldn't we use foaf:Document instead of str:Document. They seem
to have the same semantic. str:TextDocument (subclass of
foaf:Document) may be more appropriate in this case.
I have also an issue of representing each piece of text as Document.
It is very misleading as the String class represent really a string
annotation "on" a Document.

Sincerely

--
Stephane Fellah, M.Sc, B.Sc
Principal Engineer/Product Manager
smartRealm LLC
201 Loudoun St. SW
Leesburg, VA 20175
Tel: 703 669 5514
Cell: 571 502 8478
Fax: 703 669 5515


More information about the NLP2RDF mailing list