[NLP2RDF] use of underscore vs. classical query scheme

Tue Jun 26 11:41:24 CEST 2012

Dear Sebastian,

Thanks a lot for replying with such a complete answer, I understood there has been a lot of discussions around this choice and that I kind of re-open a thread that has already been closed.

Let me use your conclusions as a start point to reason about other things.
But first, a little ascii-art I found looking naively for "turtle equal sign" on google 
http://www.kgbanswers.com/how-do-you-make-a-turtle-through-text/4242925

> We might change the syntax and reuse the RFC/MediaFrag syntax. The
> only problem is that I don't see any advantages.
> Are there e.g. libraries that specialize on reading fragments?
> Request parsing, yes, but fragment parameters? I think they are not widely
> supported or implemented.

Using the w3c validator for turtle, I figured out that not only the character '=' fails validation, but also the characters '(', ')', '"', ''', '/', '[', ']', '@', and of course ' ', ';', ',', '.', and more...
That's a pity for you and us in the MLW-LT working group who were thinking about using a syntax derived from Media-Frags and a subset of XPath 1.0 to refer to an element node or an attribute node in a DOM. We can't even think about using prefixes in turtle to use a URI such as ld:xpath=id("how to join")/span[1]@translate, instead the fragment should be urlencoded an would look like this: ld:xpath%3Did%28%22how%20to%20join%22%29%2Fspan%5B1%5D%40translate ... that's ugly !

Then I found a great synthesis there about turtle and URI, IRI, percent encoding and unicode escaping: 
http://lists.w3.org/Archives/Public/public-rdf-comments/2012Mar/0006.html
Correct me if I'm wrong, but my understanding of it is that URI/IRI
  ld:xpath%3Did%28%22how%20to%20join%22%29%2Fspan%5B1%5D%40translate
where @prefix ld: <http://www.w3.org/DesignIssues/LinkedData.html#> ,
is exactly the same as URI/IRI
  <http://www.w3.org/DesignIssues/LinkedData.html#xpath=id("how to join")/span[1]@translate>

My understanding is: no matter how ugly that is, percent-encoding is the only way we could use those characters in a fragment using turtle and prefixes, and that should be completely transparent to any URI/IRI processing tool.
Thus after percent-decoding, it should be really easy to split fragments into pieces and read them.
I guess the Media-Frags group already discussed about those questions, and about this point, I'd like to directly seek after Raphaël Troncy's advices (in cc. of this mail) 

> 1.  The optional part is not easy to handle, because you would need
> to
> add owl:sameAs statements:
> 
> ld:char=717,12;length=12,UTF-8 owl:sameAs ld:char=717,12;length=12 .
> ld:char=717,12;length=12,UTF-8 owl:sameAs ld:char=717,12 .
> ld:char=717,12;UTF-8 owl:sameAs ld:char=717,12;length=9876 .
> 
> So theoretically ok, but annoying to implement and check.
> 
I understand (same for the order of the attribute-value pairs...), could you think of any process to "normalize" the optional part ?

> 3. Character like = , prevent the use of prefixes, e.g. in turtle:
> echo "@prefix ld: <http://www.w3.org/DesignIssues/LinkedData.html#> .
> @prefix owl: <http://www.w3.org/2002/07/owl#> .
> ld:offset_717_729  owl:sameAs ld:char=717,12 .
> " > test.ttl ; rapper -i turtle  test.ttl
> 
> correct turtle:
> @prefix owl: <http://www.w3.org/2002/07/owl#> .
> <http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729>
> owl:sameAs
> <http://www.w3.org/DesignIssues/LinkedData.html#char=717,12> .
> 
That could be percent-encoded. couldn't it ?

Kind regards, 
Maxime