From JSON to RDF in Six Easy Steps with JRON

June 4, 2010

Sometimes, if you stand in the right place and squint, JSON and RDF line up perfectly. Each time I notice this, I badly want a way to make them line up all the time, no matter where you’re standing. And, actually, I think it’s pretty easy.

I’ve seen a few proposals for how to work with RDF data in JSON, but the ones I’ve seen put too much burden on JSON folks to accomodate RDF. It seems to me we can let JSON keep doing what it does so well, and meanwhile, we can provide bits of RDF which can be adopted when needed. Instead of pushing RDF on people, allow them to take the parts they find useful.

In thinking about it, I’ve come up with six things RDF can do that are not standard parts of JSON. These are things one can do with JSON, of course, but not in any standard way. My suggestion is these bits of functionally be provided in an RDF-compatible way (as I detail below), so that the JSON world and the RDF world can start to really play well together.

I’m interested to hear what people think of this. Blog comment, email to (maybe cc, or catch me in the halls at SemTech. I expect this general topic of RDF-meets-JSON will be discussed at the RDF Next Steps workshop, and if the stars line up right, maybe we can get a W3C Recommendation in this space in the next year or so. Let’s call this particular proposal JRON 0.1 (Javascript RDF Object Notation), not “Sandro’s Proposal”, so I can be freer to like other designs and be properly neutral.

Step 0: Start with ordinary JSON

In general, JSON and RDF are very similar, although they are usually described using different terminology. Of course, they both have strings and numbers. They both have way of encoding a sequence of items: arrays in JSON, lists in RDF (some details below). The main structuring is around key-value pairs, which JSON calls an ‘object’. In RDF we call it the “subject” and focus on its connection with each key-value pair; the three together form an RDF triple.

The point here is that ordinary JSON structures correspond to an important subset of RDF. The don’t exactly match that subset because RDF uses namespace, as detailed in step 5 below. The other steps below show the ways in which JSON is a subset of RDF. If one takes all the steps here, using JSON with these conventions, one has full RDF.

So, here are the steps. Steps 1-3 are pretty simple and not very interesting. They address everyday concerns in data processing. Steps 4-6 may be a little more surprising if you’re not familiar with RDF.

Step 1: Allow Extended Datatypes

Why: For datatypes, JSON only has strings, numbers, booleans. Sometimes people want to store and manipulate other datatypes, such as dates, or application-specific datatypes.

How: RDF uses XML’s datatype mechanism, where data values are conveyed as a pair of items: a lexical representation (a sequence of characters) and a datatype identifier (a sequences of characters which happens to be a URI). Each datatype is a mapping from strings (lexical representations) to values; the datatype identifier tells us which datatype is to be used to interpret this particular representation.

In JRON, we represent this pair like this:

{ "__repr": "2010-03-06",
  "__type": "" }

You can put this as a value in a list or in a key-value pair, just like a string or number.

RDF doesn’t restrict which datatypes are used. Some recent standards work selected this list as the set people should implement.

Personally, I’m not sure users need to be able to extend datatypes. I see dates being important, but otherwise I’m not convinced. Still, it’s in RDF, and I like compatibility, so it’s here.

Step 2: Allow Language Tags

Why: When you have text available in several different languages, language tags provide a way to select which of the available strings, if any, matches the language preference of the user.

Also: Text-to-speech systems can handle text better if they know which natural language to use in pronouncing the text.

How: RDF allows language tags on string literals. In JRON, we use a pair like this:

{ "__text": "chat",
  "__lang": "fr" }

Commentary: Personally, I’ve never liked this bit of RDF. I feel like there are better architectures for handling language tagging. But there was a vocal community that felt this was essential, so it’s in the standard. I gather some people like it, and I haven’t seen a good counter-proposal.

Step 3: Allow Non-Tree Structures

Why: Sometimes your data is not tree structured. Sometimes you have an arbitrary directed graph, such as when representing a social network.

How: In RDF, an arbitrary “node id” is available for making non-tree structures. We can do the same in JRON, saying any object may have a node id, and if it does, the object is considered the same as all other objects with the same node id. Like this bit JSON saying my friend Eric and I both know each other:

   { "foaf_name": "Sandro Hawke",
     "foaf_knows: { "__node_id": "n102" },
     "__node_id": "n334" }
   { "foaf_name": "Eric Prud'hommeaux",
     "foaf_knows: { "__node_id": "n334" },
     "__node_id": "n102" }

In the above example, the objects representing me and Eric are given node ids, and then those node ids are used to make the links to each other. We could also do this with only one node id, but we still need at least one:

   { "foaf_name": "Sandro Hawke",
     "foaf_knows: { "foaf_name": "Eric Prud'hommeaux",
                    "foaf_knows: { "__node_id": "n334" },
     "__node_id": "n334" }

Okay, those were the ordinary three things to add to JSON. Here are the interesting three:

Step 4: Allow Cross-Document Structures

Why: Sometimes, there is useful, relevant data available on the Web but it’s not part of the current JSON document. We would not want all the Web pages in the world to be gathered into one big Web page; similarly, it’s good to keep data in different documents. But that shouldn’t stop us from easily combining the data, and keeping the links intact.

How: RDF allows IRIs (unicode web addresses) to be used as node identifiers. They are like node ids, except they work across multiple documents; they are globally unambiguous identifiers, and systems can use Web protocols to dereference them to get other useful information.

In JSON, we can do this:

     { "foaf_name": "Sandro Hawke",
       "__iri": ""

Commentary: So why do we still need __node_id? Because sometimes it’s a pain to make up a good IRI. Some people prefer to always use IRIs, avoiding node_ids in their data, and that’s fine.

Step 5: Put Keys in Namespaces

Why: When data is coming from different sources across the Web, it’s not practical to get all the sources to agree on all the terminology. Instead, by using Web addresses (URLs/IRIs) as our keys, we allow individuals and organizations to make their own decisions. They can decide how much to share their vocabularies, and they avoid accidental name collisions. The web address also provides a handy link to documentation, community sites, schemas, etc.

How: It’s awkward to use a whole, long http IRIs everywhere, so as in many RDF syntaxes, JRON has a prefix expansion mechanism, like this:

    { "foaf_name": "Sandro Hawke",
      "__prefixes": {
         "foaf_" : ""

Here the key “foaf_name” gets expanded into “”, which serves as a unique-on-the-Internet identifier for a particular conceptualization of names.

Commentary: Although I’ve left it almost to the end, this is the one mandatory part of this proposal. All the other elements are only present when required by the data. The null JRON document is: {“__prefixes”:{}}

Others have suggested this part can be optional, too, by having a set of standard prefixes for a given API. I’m not entirely opposed to that, but I’m concerned about how those defaults would be communicated in practice.

Also, I’m not sure there’s consensus on what character to use in the short name: should it be foaf_name,, foaf:name, or what? The mechanism here is that you can use whatever you want: the __prefixes table keys are matched longest-first. If there’s an entry with an empty string, that provides a default namespace.

Step 6: Allow Multiple Values Per Key

Why: Sometimes it makes sense to have more than one value for some property. For instance, as it turns out, I have more than one friend. I could use a single-value ‘list-of-friends’ property, but sometimes it makes more sense to use a ‘friend’ property that has multiple values. In particular, if we’ll be learning who my friends are from multiple sources, and we were using lists, what order would we put the resulting combined list in?

How: We still just use JSON lists, but we indicate that the order does not matter, so the values can be merged arbitrarily:

   { "": "Sandro Hawke",
     "foaf.knows: { "__values": [
                     { "": "Eric Prud'hommeaux" },
                     { "": "Dan Brickley" },
                     { "": "Matt Womer" }

Closing Thoughts

That’s it. Those are the six things that RDF does that normal JSONdoesn’t do. Did I miss something?

The API I’m imagining (but haven’t built yet) would have a few
features like:

jron_reprefix(tree, desired_prefixes)
Returns another JRON tree with all the prefixes matching the ones provided here. If you’re going to use foaf, for instance, you probably want to set a prefix like “foaf.” for foaf, so your code can expect it.
jron_merge_nodes(tree) and jron_treeify(tree)
convert a tree (suitable for transmitting) from/to a graph (suitable for use in memory
Would convert all the __type/__repr objects into suitable local objects, if they exist. Maybe even date/time objects, if there’s a suitable library installed for those.

One technical issue for RDF folks:

Should JSON arrays be considered RDF Lists or RDF Sequences? Perhaps they default to RDF Lists but there can be an option flag in the top-level object:

 { ...
   "__json_array_is_rdf_seq": true 

When that flag is absent or false, arrays would be considered RDF Lists. My sense is no one needs to use both. Maybe soon we’ll know if RDF Sequences can finally be deprecated.

30 Responses to “From JSON to RDF in Six Easy Steps with JRON”

  1. Joe Says:

    Couldnt the last point be:

    { “”: “Sandro Hawke”,
    “foaf.knows: [
    { “”: “Eric Prud’hommeaux” },
    { “”: “Dan Brickley” },
    { “”: “Matt Womer” }

    and just recognise that that property is unordered?

  2. drewp Says:

    Will people make RDF lists too often, when all they really mean is multiple values? I didn’t catch how exactly you want RDF lists to look, but I am guessing it might be simpler than the __values thing.

    People use JSON lists in tons of places that they don’t actually care about order. I don’t want them to make RDF data that’s needlessly hard to use.

    Would it be possible to make some kind of __unordered tag that is easier to stick on a list, or better yet, assume that every JSON list’s order doesn’t matter unless the user requests __ordered again?

    Also, (from MongoDB) has already used “_id” for almost the same purpose as your “__node_id”. I’m a fan of _id and single-underscore in general. I don’t see the need for working so hard to avoid collisions with other keys. You’ll never win by adding more underscores; you need an escaping system.

  3. sandhawke Says:

    Yeah, actually a lot of this can be done simpler if we put more into a property (key) metadata. That’s what Tim Berners-Lee argued for, but the RDF Core Working Group decided (back in 2002 or 2003) that it was more important to have RDF-proper still work when it was off-line. I suspect it’s time to revisit that decision.

    With that, we could have the datatype information, language information, and ordered-or-not information (steps 1, 2, and 6) be inside the property metadata. Since node_id and iri can easily be combined, that leaves us with only two steps: iri object identifiers and namespaces.

    The problem is that the RDF community will have to take this step towards JSON in using property metadata. I don’t know if we can get it to take that step.

  4. Tom Says:

    A dumb hack, but can’t you get around the date type issue by just standardizing on Unix timestamps?

  5. sandhawke Says:

    re: timestamps – you mean just send a number, and that number happens to be the number of seconds since 1970-01-01 ? From an RDF perspective, the big problem with that is that RDF really wants to know it’s a date, instead of a number. But (as I mentioned above) maybe there are other ways to say that, like making it part of the definition of the property.

  6. +1 for moving as much as possible into property metadata (or into an RDFa 1.1 style “profile” document that can be re-used throughout the site or even wider). The future belongs to RDF syntaxes that move all the RDF noise (datatypes, namespace mappings, prefixes) out of the instance documents.

  7. Nathan Says:

    Finally read this the three times which I read everything worth reading properly 🙂

    general comments:

    ‘Allow Extended Datatypes’ fully agree, extended datatypes (and even just specifying the datatype) is a definite must imho.

    The one node example under ‘Allow Non-Tree Structures’ really looks like a reference / pointer to me, and a nice approach.

    ‘Allow Cross-Document Structures’ possibly my favourite of all; I really like the way you’re thinking here.. going to give it some thought myself.

    re: what character to use in the short name: should it be foaf_name,, foaf:name, or what? – if you allow a default empty namespace to be specified then collisions could occur when using underscore; dot/period would indicate that you where dealing with the property of an object (say data.foaf.knows) whereas I believe colon’s would remain unambiguous and are nice and familiar…?

    Need to think more about this, I certainly think you have something here, good effort.

  8. Seth Russell Says:

    I think we should allow multiple unordered values simply:
    ” sru:dwelled” : [“Los Angeles”, “San Francisco”, “Seattle”]

    Regarding your API, i think we urgently need smooching operators:

    Let “-” be a collection of subjects { {“v1” : “o1” , “v2” : “o2” , …} , …}
    Then we need an operator to do something like this …
    {-} + {-} —smoch—> { – , – }
    But it should do more than just concatenate the clauses within. It should at least smooch the subjects with the same identifiers.

    Then i think we should finally define a context property in the document header for JRON. If it is not present in the RDF, well then it can be (should be) provided from the dialogue context of the agent processing it. Perhaps it could look something like …

    “_context” : [“”, “”],
    “_iri” : “”,
    “_prefixes” :
    “foaf_”: “” ,
    “dc_” : “”,
    “foaf:PersonalProfileDocument” :
    “dc:title” : “Sandro’s (Mostly-Professional) Profile Document”,
    “foaf:maker” : “Sandro Hawke”,
    “foaf:primaryTopic” : “Sandro Hawke”
    {-}, …

  9. Tom Morris Says:

    For ideas and inspiration you should look at the Metaweb Query Language (MQL) that was developed for use with Freebase. It covers a lot of the ground that you’re describing and has several years of real world usage.

  10. David Karger Says:

    Sandro, you propose annotating individual objects (of statements) using e.g. datatype (Step 1), language (Step 2), and node ID (Step 3). Consider that in a typical blob of data these are likely to be consistent—e.g., the value of a given property is always a date, or a resource with an ID, or in a particular language. Annotating each value makes for lots of redundant annotations and thus clutter. For this reason Exhibit, in line with your comments, lets you annotate _predicates_ with the information about their objects: thus, for example, in exhibit the fragment
    birthDate: {valueType: “date”}
    lets you specify that the value of any object of a birthDate statement is of type date. Similarly you can specify
    knows: {valueType: “item”}
    to specify that the string that is the object of the knows property should be interpreted as a resource identifier. Exhibit doesn’t care whether this info is in the same file or a different one; I don’t know why that matters as some comments imply. Also, I don’t follow the argument that in order to do this we have to get the rdf community to change something. This annotation on a property is a hint/default; it can be overridden by annotating elsewhere, so why does it matter if regular rdf doesn’t use it?

    Your use of node_id and iri is consistent with Exhibit’s, but Exhibit plays another trick: in the default case that no iri is given, it _implicitly_ derives an iri from the node_id (by prepending the url of the page containing the data) which is extremely useful if people go about copy/pasting the data from the Exhibit.

    Also, in addition to node_id and iri, Exhibit starts with a label property; the label is intended to be a _human readable_ identifier of the node, and Exhibit expects these even more than node_ids. That is, node_ids are not required; instead, labels are required (so we have a basic way to show a node to someone) and the id is defaulted to equal the label if it is not given explicitly. ids can still be given explicitly to distinguish two nodes with the same label.

    Exhibit’s alternative to prefixes lets you avoid figuring out special prefix syntax. In exhibit, any property can be given a uri:
    knows: {uri: “” }
    this binds knows to a particular property for the duration of the file, and lets the author disagree entirely with the “official” name of the predicate if they like (for example, if they want to use a different language).

    Exhibit uses arrays for multiple values without paying attention to the order. This has always been very convenient and I think actually ordered values are rarer, so I’d prefer to see the basic array represent an unordered list, then give some other annotation (perhaps on the property’s valueType) if I want to specify ordered lists instead.

  11. Hi Sandro,

    Comparing to the discussions in [1] that resulted in the further simplified JSON encoding in the linked-data-api [2] this seems fairly similar.

    Step 1. We use “lex^^datatype” syntax for “extension” datatypes but also include xsd:dateTime in the default data types (mapping to JavaScript compatible date format strings). Though the API configuration can default to just transmitting lexical form.

    Step 2. We use “string@lang” syntax. Though the API configuration can default to dropping lang tags.

    Step 3. We use _id instead of __node_id but otherwise similar for multiple references, use structure nesting for single reference bNodes.

    Step 4. We use _about instead of __iri but otherwise similar.

    Step 5. Our context mapping (got a bit lost in [2], will get that sorted) is mostly just an array of shortname/iri pairs. The API configuration falls back on using “prefix_name” as the shortname as you suggest but the prefix processing is not exposed in the serialization – if someone wants to invert to get the IRI it is just a simple lookup.

    Step 6. Like Exhibit we default to arrays for multiple values and don’t distinguish ordered and non-ordered. The API configuration can force use of arrays for given properties, even if particular instances are single/null valued, for ease of consumption.

    A lot of the differences are because we want the JSON to look reasonable to a JSON developer and don’t require round-tripping (though in fact round-tripping works except when properties have ambiguous ranges).




  12. Nathan Says:

    can you post a snippet which includes multiple subjects?



  13. Graham Klyne Says:

    Nice approach. I do like the way you handle prefixes.

    On the subject of annotating predicates, I think it’s probably OK but I’ll point out that the RDG WG considered something of the kind for datatyping, but didn’t go that way because it would have created logical non-monotonicity problems. As long as this is just a syntactic representation for the underlying RDF abstract syntax, that’s fine, but if we actually get into the territory of adding triples from another source being able to change the interpretation of property then we’re back into non-monotonic territory.

    Also, concerning “… important to have RDF-proper still work when it was off-line” I think that _is_ important. I don’t want my applications to stop working because I’ve had a break in Internet connectivity, and I think we’re still a long way from having reliably ubiquitous connections. That said, I think a lot of the issues can be handled better. For example, interpretation properties could deal with typing and language issues in the ways you suggest without necessarily creating local application failures when network connectivity breaks. But, for now, we do have to deal with RDF as it is.

  14. sandhawke Says:

    Graham, I wouldn’t annotate the same property, I would be make two different, related properties, in order to avoid the problems you mention.

    For example:

        eg:Aubrey eg:age "8"^^xs:int.
        eg:Aubrey eg:ageString "8".

    … and then somehow declare that the value of eg:ageString is the lexical representation using the datatype xs:deciminal of the eg:age property.

    This could be declared in RIF Core, but should probably be done in a more abstract way, which can then be generically processed using RIF Core or custom code.

    Maybe it can be expressed like:

       eg:age parallelProperty eg:ageString

    and the semantics of parallelProperty involve looking at the range of its subject and value.

  15. Graham Klyne Says:

    Sandro – indeed.

    I’m not sure I’d go to the trouble of creating parallel properties here – just use one approach or the other for each property, and use rules or similar to align usage when different styles come together. In my own work, I’m finding that a lot of information is initially presented as strings, and more detailed interpretation of the content comes later, so starting with a string value comes naturally. Transformation to different properties with datatyped values comes later, and is often an internal (i.e. efficiency) matter for a particular application.

    I’m not sure that simply saying “parallelProperty” without indicating the nature of the relationship (e.g. via rule transformations) is particularly helpful.

  16. Graham Klyne Says:

    Sandro, my last comment – I missed the bit you said about parallelProperty semantics. But I’m not entirely convinced that’s enough.

  17. Graham Klyne Says:

    FYI: I’m looking at doing a Javascript implementation of this for use with RDFQuery. My initial notes and test cases are here:

  18. Graham Klyne Says:

    What do you propose as the “top-level” JRON structure?

    On first reading, I assumed it would be an object, corresponding to a single RDF subject. But that doesn’t allow multiple statements about different subjects, such as would typically appear within an element.

    A logical alternative would be an javascript array, which would need to be treated differently to property arrays which you have described – but I don’t see that’s particularly problematic.

    Reducing it to a test case, how to represent this?:

    Sandro Hawke

    Eric Prud’hommeaux

  19. Graham Klyne Says:

    The RDF/XML example was stripped out of my last comment. Trying again:

    How to represent this:

    <?xml version=”1.0″?>
    <rdf:Description rdf:nodeID=”n334″>
    <foaf:name>Sandro Hawke</foaf:name>
    <rdf:Description rdf:nodeID=”n102″>
    <foaf:name>Eric Prud’hommeaux</foaf:name>

  20. sandhawke Says:

    I like the idea of the whole expression representing a value, so the whole thing is recursive. So following that idea, we can’t use a plan JS array. What we can do is use the multivalue structure. So we get:

    { “__values”: [
    { ‘’: ‘Sandro Hawke’,
    ‘__node_id’: ‘n334’ },
    { ‘’: ‘Eric Prud’hommeaux’,
    ‘__node_id’: ‘n102’ } ],
    “__prefixes”: {
    “foaf.” : “”

    I can’t quite decide if that’s too confusing. I’m also not sure how common this case is; I think the typical JSON user is used to always having a JSON expression be one object, so they wouldn’t think to want this. For them sending one object or one array is fine (and more can be nested if necessary.)

  21. Graham Klyne Says:

    I agree using a plain JS array is unattractive. If only because it leaves nowhere to easily hang the prefixes.

    As for the use-case – you may be right, but if trying to round-trip between JSON and “real” RDF, then some way of handling multiple statements is needed (currently, my code dies if asked to serialize statements with multiple subjects to JRON). So maybe it doesn’t matter if its less pretty, as long as it’s possible.

    Rather than using __values here, maybe use something like __statements? Not sure if that helps or hinders.

  22. Graham Klyne Says:

    I’m pretty much done with with my JRON implementation for now: notes at

    This implementation is coded in Javascript as a jQuery plugin that works with the rdfquery library. It converts data between JRON object structures and rdfquery databank objects.

    Currently, part of the Shuffl project (

  23. […] also think simplified RDF will play well with JSON developers. JRON is pretty simple, but simplified RDF would allow it to be simpler still. Or, rather, it would mean […]

  24. […] for a JSON serialization that appeals to non-RDF-folks. This was something I was interested in (cf JRON) but I agree there’s too much design work to do in a Working Group like this. The door was […]

  25. Alex Says:

    “2: Allow Language Tags…Personally, I’ve never liked this bit of RDF”

    I sympathize with your distaste. Languages and datatype could have been predicates of the named tuple or object, which would have given languages more flexibility in different domains, including more properties on the languages themselves.

    On the other hand, the language tag drastically reduces ambiguity and processing cost in a collaborative multi-lingual dictionary. But that’s a datastore implementation detail, not something that should be encoded in the abstract or serialization formats.

    JSON Triple Sets flattens the triple (octuple?) with s, p, o, s_type, p_type, o_type, o_datatype, o_lang. This could be implemented with a consistent hierarchy:

    {s_value, s_type,
    p_value, p_type,
    o_value, o_type, o_datatype, o_lang}


    { s_value,
    p : [{ p_value,
    o : [{ o_value,

  26. Alan Mullane Says:

    Great post!

    Any updates on where this RDF/JSON mapping is now?

    I think w3c were looking at this last year but not sure where its at yet – found this link during my travels around JSON/RDF –

    Also, does anyone know yet of a java implementation of the above mapping (or others) to convert RDF To JSON and back for use in a web app?

    If not I might post an implementation up somewhere and link back here later

    I did find the following implementations but have not tried them in anger yet and not sure what mapping they use (as I said, I’d prefer to use an implementation of the above mapping though).

    I’m looking at serializing RDF/JSON in Java Spring MVC using Spring-Jena and an extension of the Spring-Jackson JSON/XML serializer from (have to implement this part though) but if someone knows if this has been done already it might save me some time.


  27. sandhawke Says:

    Any updates on where this RDF/JSON mapping is now?

    It’s being actively worked at W3C under the name “JSON-LD”. See or

  28. Chengcong Says:

    I am quite new to RDF and JSON. Do I need an ontology before converting JSON or any other formated data to RDF?

  29. sandhawke Says:

    You need to have some idea what properties you’re going to use, and you need to pick URLs for those properties, or find someone who already has. You don’t need to pick classes unless you want to, and you don’t need to to use OWL — although some people find it useful.

  30. […] pointed to @sandhawke‘s proposition to convert RDF to JSON. Read his piece From JSON to RDF in Six Easy Steps with JRON first. Below, co-posting my comment I made there also on my […]

Comments are closed.

%d bloggers like this: