[NLP2RDF] NIF: mandatory properties and types

Mon Jun 18 17:46:37 CEST 2012

Hi all,

I have a question about the choice to use an underscore '_' to separate the different parts of the URI fragment,

For the two existing recipes (offset-based and context-hash-based URIs), if there is an underscore '_' among the 20 first characters of the anchored string, this leads to a 5-parts-looking NIF URI. OK, that's not a big issue. But would you introduce a new NIF recipe with one of the inner parts being a string, you would need to percent-encode the underscores (%5F). This is unconventional as the underscore is a RFC 3986 Unreserved Character and classical urlencode methods don't replace it.

I suggest that NIF 2.0 use a classical query scheme for the URI fragments: #nifRecipe=identifier(&param=val)*
This is also the direction taken in the W3C Media Fragment Proposed Recommendation [1]

For instance:
Offset-based URIs: #nif=offset&begin=14406&end=14418&text=Semantic%20Web
Context-Hash-based URIs: #nif=hash&context=4&length=12&md5=79edde636fac847c006605f82d4c5c4d&text=Semantic%20Web

This leads to slightly more verbose fragments, but then you don't need to escape RFC 3986 Unreserved Characters, as the '&' and '=' are always escaped (%26 and %3D).

[1] http://www.w3.org/TR/media-frags/

Regards,
Maxime Lefrançois