[NLP2RDF] WWW paper scope

Mon Sep 21 19:56:39 CEST 2015

Dear All,

Since we discussed a wide range of ideas for the paper on the 09/09 
meeting, I've had time to implement some of them:

a) number of links among datasets (similarity based on entities)
b) link strength among datasets. I've defined this being the number of 
resources (with the same namespace) divided by number of links (that 
describes the namespace).
c) similarity based on properties. I used Jaccard Index 
(https://en.wikipedia.org/wiki/Jaccard_index) to express how similar are 
two distributions.
d) our diagram showing the three above items (deployed a new instance 
here: http://cirola2000.cloudapp.net:3001/#/diagram, please do not try 
yet values of similarity/strength below 0.2)

I think these items are our main contributions regarding implementation. 
What we still need is a deeper analysis about the outcome and use cases 
of these implementations. For example, what are the benefits of having a 
dataset strongly linked and with high similarity? And the other way round?
I do remember of Sebastian saying that datasets with high link strength 
contains complementary data. For high values of property similarities, 
the datasets can be easily integrated. Well, this makes perfect sense 
for me, and we could make the paper guided by these thoughts.

Given the current status of the project, I think we fit more at web 
semantic track rather than web research. Maybe when we have an wider 
scope implemented (ex: linking completeness feature, and extend stuff 
for not only RDF data) we could try the web research.

Please share your ideas and lat me know what you think!

All the best,
Ciro.