November 10, 2010
I propose that we designate a certain subset of the RDF model as “Simplified RDF” and standardize a method of encoding full RDF in Simplified RDF. The subset I have in mind is exactly the subset used by Facebook’s Open Graph Protocol (OGP), and my proposed encoding technique is relatively straightforward.
I’ve been mulling over this approach for a few months, and I’m fairly confident it will work, but I don’t claim to have all the details perfect yet. Comments and discussion are quite welcome, on this posting or on the email@example.com mailing list. This discussion, I’m afraid, is going to be heavily steeped in RDF tech; simplified RDF will be useful for people who don’t know all the details of RDF, but this discussion probably wont be.
My motivation comes from several directions, including OGP. With OGP, Facebook has motivated a huge number of Web sites to add RDFa markup to their pages. But the RDF they’ve added is quite constrained, and is not practically interoperable with the rest of the Semantic Web, because it uses simplified RDF. One could argue that Facebook made a mistake here, that they should be requiring full “normal” RDF, but my feeling is their engineering decisions were correct, that this extreme degree of simplification is necessary to get any reasonable uptake.
I also think simplified RDF will play well with JSON developers. JRON is pretty simple, but simplified RDF would allow it to be simpler still. Or, rather, it would mean folks using JRON could limit themselves to an even smaller number of “easy steps” (about three, depending on how open design issues are resolved).
Cutting Out All The Confusing Stuff
Simplified RDF makes the following radical restrictions to the RDF model and to deployment practice:
The subject URIs are always web page addresses. The content-negotiation hack for “hash” URIs and the 303-see-other hack for “slash” URIs are both avoided.
(Open issue: are html fragment URIs okay? Not in OGP, but I think it will be okay and useful.)
The values of the properties (the “object” components of the RDF triples) are always strings. No datatype information is provided in the data, and object references are done by just putting the object URI into the string, instead of making it a normal URI-label node.
(Open issue: what about language tags? I think RDFa will provide this for free in OGP, if the html has a language tag.)
(Open issue: what about multi-valued (repeated) properties? Are they just repeated, or are the multiple values packing into the string, perhaps? OGP has multiple administrators listed as “USER_ID1,USER_ID2”. JSON lists are another factor here.)
At first inspection this reduction appears to remove so much from RDF as to make it essentally useless. Our beloved RDF has been blown into a hundred pieces and scattered to the wind. It turns out, however, it still has enough enough magic to reassemble itself (with a little help from its friends, http and rdfs).
This image may give a feeling for the relationship of full RDF and simplified RDF:
Reassembling Full RDF
The basic idea is that given some metadata (mostly: the schema), we can construct a new set of triples in full RDF which convey what the simplified RDF intended. The new set will be distinguished by using different predicates, and the predicates are related by schema information available by dereferencing the predicate URI. The specific relations used, and other schema information, allows us to unambiguously perform the conversion.
For example, og:title is intended to convey the same basic notion as rdfs:label. They are not the same property, though, because og:title is applied to a page about the thing which is being labeled, rather than the thing itself. So rather than saying they are related by owl:equivalentProperty, we say:
og:title srdf:twin rdfs:label.
This ties to them together, saying they are “parallel” or “convertable”, and allowing us to use other information in the schema(s) for og:title and rdfs:label to enable conversion.
The conversion goes something like this:
The subject URLs should usually be taken as pages whose foaf:primaryTopic is the real subject. (Expressing the XFN microformat in RDF provides a gentle introduction to this kind of idea.) That real subject can be identified with a blank node or with a constructed URI using a “thing described by” service such as t-d-b.org. A little more work is needed on how to make such services efficient, but I think the concept is proven. I’d expect facebook to want to run such a service.
In some cases, the subject URL really does identify the intended subject, such as when the triple is giving the license information for the web page itself. These cases can be distinguished in the schema by indicating the simplified RDF property is an IndirectProperty or MetadataProperty.
The object (value) can be reconstructed by looking at the range of the full-RDF twin. For example, given that something has an og:latitude of “37.416343”, og:latitude and example:latitude are twins, and example:latitude has a range of xs:decimal, we can conclude the thing has an example:latitude of “37.416343”^^xs:decimal.
Similarly, the Simplified RDF technique of puting URIs in strings for the object can be undone by know the twin is an ObjectProperty, or has some non-Literal range.
I believe language tagging could also be wrapped into the predicate (like comment_fr, comment_en, comment_jp, etc) if that kind of thing turns out to be necessary, using an OWL 2 range restrictions on the rdf:langRange facet.
So, that’s a rough sketch, and I need to wrap this up. If you’re at ISWC, I’ll be giving a 2 minute lightning talk about this at lunch later today. But if you’ve ready this far, the talk wont say say anything you don’t already know.
FWIW, I believe this is implementable in RIF Core, which would mean data consumers which do RIF Core processing could get this functionality automatically. But since we don’t have any data consumer libraries which do that yet, it’s probably easiest to implement this with normal code for now.
I think this is a fairly urgent topic because of the adoption curve (and energy) on OGP, and because it might possibly inform the design of a standand JSON serialization for RDF, which I’m expecting W3C to work on very soon.
October 27, 2008
This question doesn’t usually matter very much, but sometimes we get into arguments about whether something (eg e-mail, streaming video, instant-messaging) is part of “The Web” or not.
Due to various circumstance last week I found myself drafting this text about it (while kicking myself and telling myself it was a waste of time). Jet-Lag Insomnia, more or less.
I think the main value of my text, or any effort like this, is in revealing our often-unstated ideas about what the Web can or should become. Is this “architecture”? When you renovate a building, adding new wings, how much are you changing its architecture? How much are you changing its subtle good and bad features? Certainly it’s good to think about that question before making the changes.
It is also nice to get our terms straight. I loathe the term “Resource” and quite dislike the term “URI”. Along the way here, I try to define what I think are the key terms in Web Architecture. For extra credtit, I end up with defining a “Web Standard”.
Wikipedia says the Web is:
…a system of interlinked hypertext documents accessed via the Internet.
In contrast, WebArch says it is:
…an information space in which the items of interest, referred to as resources, are identified by global identifiers called Uniform Resource Identifiers (URI)
Without going into my dislike for WebArch, I really wish the Semantic Web would/could (somehow) stick to a definition closer to Wikipedia’s.
My one-line defintion is this:
The Web is a global communications medium provided by a decentralized computer system.
In more detail:
“The Web is a global…”: Although conceptually there could be many different “webs”, there is one which is understood as “The Web”. The Web follows (and uses) The Internet in being designed to connect different local systems. An installation of web technology usually ends up connected to others, becoming part of the unified global Web, because in most situations the value of doing so greatly outweighs the costs. The end effect is a single, integrated system built up of all available, connected components.
“communications medium” : The Web provides a way for people to communicate with each other. It does this by letting them create web pages (often collected into web sites), with unique names (the web address, or URL), which other people can view and interact with. The system does not restrict what exactly constitutes a “page” (sometimes called a “resource): originally, Web pages were essentially an on-line of paper documents, but they have evolved to now provide, within each “page”, a full user interface to remote computer systems. The web addresses are essential, because they allow people to communicate about particular pages and, crucially, they allow one page to name another (to link to another) so that user can learn about and “visit” (use) other pages.
Although generally intended for use by people, the Web is sometimes used by other computer systems. A search engine traverses the Web like a user, and then helps users find the pages they want. The Web Services and Semantic Web standards provide various ways for computer systems to interact with each other over the Web, attempting to leverage Web infrastructure as an element in new systems.
“decentralized computer system”: While the Web is in one sense a single system, it is composed of other computer systems, most of which serve as web servers or web clients. It has no central point of control (except perhaps the Domain Name System (DNS), which is part of the underlying Internet); instead, the system’s behavior for a particular user depends on the clients and servers being used by that user. Many features of the Web rely the behavior being essentially the same for all users, and that consistency depends on the underlying systems behaving consistently. Where there is consistent behavior, and that behavior is documented, the document is a Web Standard.