November 10, 2010
I propose that we designate a certain subset of the RDF model as “Simplified RDF” and standardize a method of encoding full RDF in Simplified RDF. The subset I have in mind is exactly the subset used by Facebook’s Open Graph Protocol (OGP), and my proposed encoding technique is relatively straightforward.
I’ve been mulling over this approach for a few months, and I’m fairly confident it will work, but I don’t claim to have all the details perfect yet. Comments and discussion are quite welcome, on this posting or on the firstname.lastname@example.org mailing list. This discussion, I’m afraid, is going to be heavily steeped in RDF tech; simplified RDF will be useful for people who don’t know all the details of RDF, but this discussion probably wont be.
My motivation comes from several directions, including OGP. With OGP, Facebook has motivated a huge number of Web sites to add RDFa markup to their pages. But the RDF they’ve added is quite constrained, and is not practically interoperable with the rest of the Semantic Web, because it uses simplified RDF. One could argue that Facebook made a mistake here, that they should be requiring full “normal” RDF, but my feeling is their engineering decisions were correct, that this extreme degree of simplification is necessary to get any reasonable uptake.
I also think simplified RDF will play well with JSON developers. JRON is pretty simple, but simplified RDF would allow it to be simpler still. Or, rather, it would mean folks using JRON could limit themselves to an even smaller number of “easy steps” (about three, depending on how open design issues are resolved).
Cutting Out All The Confusing Stuff
Simplified RDF makes the following radical restrictions to the RDF model and to deployment practice:
The subject URIs are always web page addresses. The content-negotiation hack for “hash” URIs and the 303-see-other hack for “slash” URIs are both avoided.
(Open issue: are html fragment URIs okay? Not in OGP, but I think it will be okay and useful.)
The values of the properties (the “object” components of the RDF triples) are always strings. No datatype information is provided in the data, and object references are done by just putting the object URI into the string, instead of making it a normal URI-label node.
(Open issue: what about language tags? I think RDFa will provide this for free in OGP, if the html has a language tag.)
(Open issue: what about multi-valued (repeated) properties? Are they just repeated, or are the multiple values packing into the string, perhaps? OGP has multiple administrators listed as “USER_ID1,USER_ID2″. JSON lists are another factor here.)
At first inspection this reduction appears to remove so much from RDF as to make it essentally useless. Our beloved RDF has been blown into a hundred pieces and scattered to the wind. It turns out, however, it still has enough enough magic to reassemble itself (with a little help from its friends, http and rdfs).
This image may give a feeling for the relationship of full RDF and simplified RDF:
Reassembling Full RDF
The basic idea is that given some metadata (mostly: the schema), we can construct a new set of triples in full RDF which convey what the simplified RDF intended. The new set will be distinguished by using different predicates, and the predicates are related by schema information available by dereferencing the predicate URI. The specific relations used, and other schema information, allows us to unambiguously perform the conversion.
For example, og:title is intended to convey the same basic notion as rdfs:label. They are not the same property, though, because og:title is applied to a page about the thing which is being labeled, rather than the thing itself. So rather than saying they are related by owl:equivalentProperty, we say:
og:title srdf:twin rdfs:label.
This ties to them together, saying they are “parallel” or “convertable”, and allowing us to use other information in the schema(s) for og:title and rdfs:label to enable conversion.
The conversion goes something like this:
The subject URLs should usually be taken as pages whose foaf:primaryTopic is the real subject. (Expressing the XFN microformat in RDF provides a gentle introduction to this kind of idea.) That real subject can be identified with a blank node or with a constructed URI using a “thing described by” service such as t-d-b.org. A little more work is needed on how to make such services efficient, but I think the concept is proven. I’d expect facebook to want to run such a service.
In some cases, the subject URL really does identify the intended subject, such as when the triple is giving the license information for the web page itself. These cases can be distinguished in the schema by indicating the simplified RDF property is an IndirectProperty or MetadataProperty.
The object (value) can be reconstructed by looking at the range of the full-RDF twin. For example, given that something has an og:latitude of “37.416343”, og:latitude and example:latitude are twins, and example:latitude has a range of xs:decimal, we can conclude the thing has an example:latitude of “37.416343”^^xs:decimal.
Similarly, the Simplified RDF technique of puting URIs in strings for the object can be undone by know the twin is an ObjectProperty, or has some non-Literal range.
I believe language tagging could also be wrapped into the predicate (like comment_fr, comment_en, comment_jp, etc) if that kind of thing turns out to be necessary, using an OWL 2 range restrictions on the rdf:langRange facet.
So, that’s a rough sketch, and I need to wrap this up. If you’re at ISWC, I’ll be giving a 2 minute lightning talk about this at lunch later today. But if you’ve ready this far, the talk wont say say anything you don’t already know.
FWIW, I believe this is implementable in RIF Core, which would mean data consumers which do RIF Core processing could get this functionality automatically. But since we don’t have any data consumer libraries which do that yet, it’s probably easiest to implement this with normal code for now.
I think this is a fairly urgent topic because of the adoption curve (and energy) on OGP, and because it might possibly inform the design of a standand JSON serialization for RDF, which I’m expecting W3C to work on very soon.