Simplified RDF

November 10, 2010

I propose that we designate a certain subset of the RDF model as “Simplified RDF” and standardize a method of encoding full RDF in Simplified RDF. The subset I have in mind is exactly the subset used by Facebook’s Open Graph Protocol (OGP), and my proposed encoding technique is relatively straightforward.

I’ve been mulling over this approach for a few months, and I’m fairly confident it will work, but I don’t claim to have all the details perfect yet. Comments and discussion are quite welcome, on this posting or on the semantic-web@w3.org mailing list. This discussion, I’m afraid, is going to be heavily steeped in RDF tech; simplified RDF will be useful for people who don’t know all the details of RDF, but this discussion probably wont be.

My motivation comes from several directions, including OGP. With OGP, Facebook has motivated a huge number of Web sites to add RDFa markup to their pages. But the RDF they’ve added is quite constrained, and is not practically interoperable with the rest of the Semantic Web, because it uses simplified RDF. One could argue that Facebook made a mistake here, that they should be requiring full “normal” RDF, but my feeling is their engineering decisions were correct, that this extreme degree of simplification is necessary to get any reasonable uptake.

I also think simplified RDF will play well with JSON developers. JRON is pretty simple, but simplified RDF would allow it to be simpler still. Or, rather, it would mean folks using JRON could limit themselves to an even smaller number of “easy steps” (about three, depending on how open design issues are resolved).

Cutting Out All The Confusing Stuff

Simplified RDF makes the following radical restrictions to the RDF model and to deployment practice:

The subject URIs are always web page addresses. The content-negotiation hack for “hash” URIs and the 303-see-other hack for “slash” URIs are both avoided.

(Open issue: are html fragment URIs okay? Not in OGP, but I think it will be okay and useful.)
The values of the properties (the “object” components of the RDF triples) are always strings. No datatype information is provided in the data, and object references are done by just putting the object URI into the string, instead of making it a normal URI-label node.

(Open issue: what about language tags? I think RDFa will provide this for free in OGP, if the html has a language tag.)

(Open issue: what about multi-valued (repeated) properties? Are they just repeated, or are the multiple values packing into the string, perhaps? OGP has multiple administrators listed as “USER_ID1,USER_ID2”. JSON lists are another factor here.)

At first inspection this reduction appears to remove so much from RDF as to make it essentally useless. Our beloved RDF has been blown into a hundred pieces and scattered to the wind. It turns out, however, it still has enough enough magic to reassemble itself (with a little help from its friends, http and rdfs).

This image may give a feeling for the relationship of full RDF and simplified RDF:

Reassembling Full RDF

The basic idea is that given some metadata (mostly: the schema), we can construct a new set of triples in full RDF which convey what the simplified RDF intended. The new set will be distinguished by using different predicates, and the predicates are related by schema information available by dereferencing the predicate URI. The specific relations used, and other schema information, allows us to unambiguously perform the conversion.

For example, og:title is intended to convey the same basic notion as rdfs:label. They are not the same property, though, because og:title is applied to a page about the thing which is being labeled, rather than the thing itself. So rather than saying they are related by owl:equivalentProperty, we say:

  og:title srdf:twin rdfs:label.

This ties to them together, saying they are “parallel” or “convertable”, and allowing us to use other information in the schema(s) for og:title and rdfs:label to enable conversion.

The conversion goes something like this:

The subject URLs should usually be taken as pages whose foaf:primaryTopic is the real subject. (Expressing the XFN microformat in RDF provides a gentle introduction to this kind of idea.) That real subject can be identified with a blank node or with a constructed URI using a “thing described by” service such as t-d-b.org. A little more work is needed on how to make such services efficient, but I think the concept is proven. I’d expect facebook to want to run such a service.

In some cases, the subject URL really does identify the intended subject, such as when the triple is giving the license information for the web page itself. These cases can be distinguished in the schema by indicating the simplified RDF property is an IndirectProperty or MetadataProperty.
The object (value) can be reconstructed by looking at the range of the full-RDF twin. For example, given that something has an og:latitude of “37.416343”, og:latitude and example:latitude are twins, and example:latitude has a range of xs:decimal, we can conclude the thing has an example:latitude of “37.416343”^^xs:decimal.

Similarly, the Simplified RDF technique of puting URIs in strings for the object can be undone by know the twin is an ObjectProperty, or has some non-Literal range.

I believe language tagging could also be wrapped into the predicate (like comment_fr, comment_en, comment_jp, etc) if that kind of thing turns out to be necessary, using an OWL 2 range restrictions on the rdf:langRange facet.

Next Steps

So, that’s a rough sketch, and I need to wrap this up. If you’re at ISWC, I’ll be giving a 2 minute lightning talk about this at lunch later today. But if you’ve ready this far, the talk wont say say anything you don’t already know.

FWIW, I believe this is implementable in RIF Core, which would mean data consumers which do RIF Core processing could get this functionality automatically. But since we don’t have any data consumer libraries which do that yet, it’s probably easiest to implement this with normal code for now.

I think this is a fairly urgent topic because of the adoption curve (and energy) on OGP, and because it might possibly inform the design of a standand JSON serialization for RDF, which I’m expecting W3C to work on very soon.

Posted by sandhawke
Filed in Web Architecture

9 Comments »

9 Responses to “Simplified RDF”

Juan Sequeda Says:

November 11, 2010 at 1:31 am
+1 for making RDF simple.

But what about linked data then?
Jiri Prochazka Says:

November 11, 2010 at 3:22 am
Hi, I have also been dabbling with an idea of creating a simpler RDF subset (more like alternative – removing language tags, datatypes, maybe literals altogether in favour of data: URIs, allowing literals in all triple positions) but I don’t think it would be very beneficial. RDF is here, it isn’t really bad and we should stick with it, even if it is sometimes painful to work with. I think everyone has a bit different mindset so we are leaning into creating languages which fit ours better. It is ok to use RDF alternative, but never encourage others to publish it as RDF replacement – there always should be RDF published along with it.
David Wood Says:

November 11, 2010 at 10:19 am
I’ve heard the argument “we should stick with it, even if it is sometimes painful to work with” a lot recently in relation to RDF, http-range-14, REST, etc, and think that is a load of crap.

The Web is our invented system and it is malleable. Web Architecture is not even software; it is a social contract and one that can and should change to our changing needs.

Sure, we have a requirement not to break things a lot of others are using, but that should never be used as an excuse not to innovate out-of-band. Innovate! Bring it on!

Sandro, I like Simplified RDF a lot – although I also recognize its limitations. That’s OK. You can do a lot with the limited subset. Others can do a lot elsewhere with full RDF. Well done.
David Wood Says:

November 11, 2010 at 10:33 am
I just realized that Simplified RDF ducks the question about how to name real world things (intentionally), but then has to do so anyway when you say, “The subject URLs should usually be taken as pages whose foaf:primaryTopic is the real subject.” There may be some more to be done there 🙂
Jiri Prochazka Says:

November 11, 2010 at 11:59 am
David, sure the Web Architecture is a social contract, but so are for example natural languages. If you invent your own new Esperanto, ok cool, but if you publish all your work exclusively in it, don’t expect many people to understand it – going through the pains of learning a new language. This is why I say, always offer data in RDF along with your new format. Maybe your language will be better and people will start switch to it, but I don’t expect so if it isn’t enabling something awesome, or is *much* simpler then RDF.

Do not forget that in field of structured data formats we are talking about, the reality is that probably all software is unable to learn new languages without programmers working on it. We still don’t have standard to enable this – evolvability of machine languages.
Scott Banwart's Blog » Blog Archive » Distributed Weekly 76 Says:

November 12, 2010 at 10:23 am
[…] Simplified RDF […]
Gewichtsverlies, vermagering Wonder, Zuid-Afrika, Etraffic Online | Just Another Kambrit Blog Says:

April 22, 2011 at 12:56 pm
[…] Simplified RDF « Decentralyze – Programming the Data Cloud […]
sandhawke Says:

November 21, 2012 at 7:46 am
David Wood, I think something like t-d-b is the answer to the concern you raise. t-d-b should be formalized as a .well-known scheme. This way one would know from inspection that http://example.com/.well-known/tdb/some-uri denotes the same thing as http://example.org/.well-known/tdb/some-uri.

It remains to figure out how to safely minimize the escaping of the ‘some-uri’ (maybe leave off the leading “http://” or “https://”?) and what string to use for “tdb” above. And write it up and register it with IETF.
David Booth Says:

April 27, 2013 at 1:31 pm
http://lists.w3.org/Archives/Public/www-tag/2011Oct/0096.html
[[
FYI, I also described the “parallel properties” approach (in 2006),
though at that time I called it the “shadow ontology” approach:
http://lists.w3.org/Archives/Public/public-swbp-wg/2006Jan/0171.html
TimBL argued against it here:
http://lists.w3.org/Archives/Public/www-tag/2007Jun/0012.html
The refinement that Sandro described below is a kind of a combination of
that and the “Multiple-Sense Approach” also described at:
http://lists.w3.org/Archives/Public/public-swbp-wg/2006Jan/0171.html
]]

Comments are closed.

Decentralyze

Simplified RDF

November 10, 2010

Cutting Out All The Confusing Stuff

Reassembling Full RDF

Next Steps

9 Responses to “Simplified RDF”

Pages

Categories

Decentralyze

Simplified RDF

November 10, 2010

Cutting Out All The Confusing Stuff

Reassembling Full RDF

Next Steps

Share this:

Related

9 Responses to “Simplified RDF”

Pages

Categories