GrowJSON
June 30, 2014
I have an idea that I think is very important but I haven’t yet polished to the point where I’m comfortable sharing it. I’m going to share it anyway, unpolished, because I think it’s that useful.
So here I am, handing you a dull, gray stone, and I’m saying there’s a diamond inside. Maybe even a dilithium crystal. My hope is that a few experts will see what I see and help me safely extract it. Or maybe someone has already extracted it, and they can just show me.
The problem I’m trying to solve is at the core of decentralized (or loosely-coupled) systems. When you have an overall system (like the Web) composed of many subsystems which are managed on their own authority (websites), how can you add new features to the system without someone coordinating the changes?
RDF offers a solution to this, but it turns out to be pretty hard to put into practice. As I was thinking about how to make that easier, I realized my solution works independently of the rest of RDF. It can be applied to JSON, XML, or whatever. For now, I’m going to start with JSON.
Consider two on-the-web temperature sensors:
> GET /temp HTTP/1.1
> Host: paris.example.org
> Accept: text/json
>
< HTTP/1.1 200 OK
< Content-Type: text/json
<
{"temp":35.2}
> GET /temp HTTP/1.1
> Host: berkeley.example.org
> Accept: text/json
>
< HTTP/1.1 200 OK
< Content-Type: text/json
<
{"temp":35.2}
The careful human reader will immediately wonder whether these temperatures are in Celcius or Fahrenheit, or if maybe the first is in Celcius and the second Fahrenheit. This is a trivial example of a much deeper problem.
Here’s the first sketch of my solution:
> GET /temp HTTP/1.1
> Host: paris.example.org
> Accept: text/json
>
< HTTP/1.1 200 OK
< Content-Type: text/json
<
[
{"GrowJSONVersion": 0.1,
"defs": {
"temp": "The temperature in degrees Fahrenheit as measured by a sensor and expressed as a JSON number"
},
{"temp":35.2}
]
> GET /temp HTTP/1.1
> Host: berkeley.example.org
> Accept: text/json
>
< HTTP/1.1 200 OK
< Content-Type: text/json
<
[
{"GrowJSONVersion": 0.1,
"defs": {
"temp": "The temperature in degrees Fahrenheit as measured by a sensor and expressed as a JSON number"
},
{"temp":35.2}
]
I know it looks ugly, but now it’s clear that both readings are in Fahrenheit.
My proposal is that much like some data-consuming systems do schema validation now, GrowJSON data-consuming systems would actually look for that exact definition string.
This way, if a third sensor came on line:
> GET /temp HTTP/1.1
> Host: doha.example.org
> Accept: text/json
>
< HTTP/1.1 200 OK
< Content-Type: text/json
<
[
{"GrowJSONVersion": 0.1,
"defs": {
"temp": "The temperature in degrees Celcius as measured by a sensor and expressed as a JSON number"
},
{"temp":35.2}
]
the software could automatically determine that it does not contain data in the format it was expecting. In this case, a human could easily read the definition and make the software handle both formats.
That’s the essence of the idea. Any place you might have ambiguity or a naming collision in your JSON, instead use natural language definitions that are detailed enough that (1) two people are very unlikely to chose the same text, and (2) if they did, they’re extremely likely to have meant the same thing, and while we’re at it (3) will help people implement code to handle it.
I see you shaking your head in disbelief, confusion, or possibly disgust. Let me try answering a few questions:
Question: Are you really suggesting every JSON document would include complete documentation of all the fields used in that JSON document?
Conceptually, yes, but in practice we’d want to have an “import” mechanism, allowing those definitions to be in another file or Web Resource. That might look something like:
> GET /temp HTTP/1.1
> Host: paris.example.org
> Accept: text/json
>
< HTTP/1.1 200 OK
< Content-Type: text/json
<
[
{"GrowJSONVersion": 0.1}
{"import": "http://example.org/schema",
"requireSHA256": "7998bb7d2ff3cfa2666016ea0cd7a379b42eb5b0cebbb1142d8f086efaccfbc6",
},
{"temp":35.2}
]
> GET /schema HTTP/1.1
> Host: example.org
> Accept: text/json
>
< HTTP/1.1 200 OK
< Content-Type: text/json
<
[
{"GrowJSONVersion": 0.1,
"defs": {
"temp": "The temperature in degrees Fahrenheit as measured by a sensor and expressed as a JSON number"
}
]
Question: Would that break if you didn’t have a working Internet connection?
No, by including the SHA we make it clear the bytes aren’t allowed to change. So the data-consumer can actually hard-code the results of retrieval obtained at build time.
Question: Would the data-consumer have to copy the definition without changing one letter?
Yes, because the machines don’t know which letters might be important. In practice the person programming the data-consumer could do the same kind of import, referring to the same frozen schema on the Web, if they want to. Or they can just cut-and-paste the definitions they are using.
Question: Would the object keys still have to match?
No, only the definitions. If the Berkeley sensor used tmp instead of temp, the consumer would still be able to understand it just the same.
Question: Is that documentation string just plaintext?
I’m not sure yet. I wish markdown were properly standardized, but it’s not. The main kind of formatting I want in the definitions is links to other terms defined in the same document. Something like these [[term]] expressions:
{"GrowJSONVersion": 0.1,
"defs": {
"temp": "The temperature in degrees Fahrenheit as measured by a sensor at the current [[location]] and expressed as a JSON number"
"location": "The place where the temperature reading [[temp]] was taken, expressed as a JSON array of two JSON numbers, being the longitude and latitude respectively, expressed as per GRS80 (as adopted by the IUGG in Canberra, December 1979)"
}
As I’ve been playing around with this, I keep finding good documentation strings include links to related object keys (properties), and I want to move the names of the keys outside the normal text, since they’re supposed to be able to change without changing the meaning.
Question: Can I fix the wording in some definition I wrote?
Yes, clearly that has to be supported. It would be done by keeping around the older text as an old version. As long as the meaning didn’t change, that’s okay.
Question: Does this have to be in English?
No. There can be multiple languages available, just like having old versions available. If any one of them matches, it counts as a match.
July 1, 2014 at 12:18 am
I’d suggest returning the text descriptions for fields via a HTTP OPTIONS request. Including the definitions for every JSON document served is really inefficient. Also when a RESTful interface is defined you can more explicitly ask for definitions – which is neededed for your importing mechanism.
Interestingly, if you look at an REST API generated by Django REST Framework it will populate the OPTIONS for your API automatically – and the human can simply put a Python DocString in their code to describe interfaces and attributes… For example this is what I get back from an OPTIONS request from one of my projects.
Bill List
A viewset for all Bills
OPTIONS /api/v1/bills/
HTTP 200 OK Content-Type: application/json Vary: Accept Allow: GET, POST, HEAD, OPTIONS
{
“name”: “Bill List”,
“description”: “A viewset for all Bills”,
“renders”: [
“application/json”,
“text/html”
],
“parses”: [
“application/json”,
“application/x-www-form-urlencoded”,
“multipart/form-data”
],
“actions”: {
“POST”: {
“owner_name”: {
“type”: “field”,
“required”: false,
“read_only”: true
},
“creator”: {
“type”: “field”,
“required”: true,
“read_only”: false
},
“due_date”: {
“type”: “date”,
“required”: true,
“read_only”: false,
“label”: “due date”
},
“total”: {
“type”: “decimal”,
“required”: true,
“read_only”: false,
“label”: “total”
},
“bill_name”: {
“type”: “string”,
“required”: true,
“read_only”: false,
“label”: “bill name”,
“max_length”: 32
},
“bill_description”: {
“type”: “string”,
“required”: false,
“read_only”: false,
“label”: “bill description”,
“max_length”: 32
}
}
}
}
July 1, 2014 at 7:39 am
I believe the Web already does that with media types and Link headers. I’d be curious to know your opinion on where they diverge.
July 1, 2014 at 7:53 am
@danieldevine86 I’ll have to look more at what django is doing there. In general, I don’t like the idea of providing content via OPTIONS, since it means that content isn’t available via GET, so it’s not really on the web. As I said, I’m not suggesting one would normally inline all the definitions, just like to them with some kind of “import”. Also, yes, some kind of HTTP mechanism would be good for places where you can PUT or POST some data to advertise the kind of data they understand/allow. OPTIONS might be good for that, but it should just provide a link to the suitable definitions, I think, not the text of those definitions.
@bertails I don’t quite understand. You certainly can’t use a different media type which happens to be textually defined the same way and get the same effect….
July 1, 2014 at 7:06 pm
It seems to me like you’re describing something that already exists: JSON-LD [1]. It was born out of the very idea you’re discussing here: providing “context” to your JSON data. With JSON-LD, you can attach a context definition that clarifies the meaning of each property in your JSON data. This definition allows you to map properties to URLs that can contain a natural-language description of what each property means (and much more).
[1] http://www.w3.org/TR/json-ld/
July 2, 2014 at 8:47 am
Thanks for the clarifying question, Dave. The core idea in GrowJSON is that when the definitions match, the things they define are taken to be identical.
You could do this in RDF (and thus JSON-LD) by defining a suitable vocabulary, which would mostly be an inverse-functional “human readable definition” property, and requiring consumers to do suitable inference. That’s what I was working on when I realized that given such a system, for my application, there wouldn’t be much benefit to the rest of RDF.
It’s heresy, I know. I’ve been deep in RDF for about 15 years. But my bottom line is actually the features and functionality, and at the moment my intuition says GrowJSON would win that, at least for the features I care about. More research is necessary to see if that pans out, and I figured that would start with talking to more people about GrowJSON.
July 2, 2014 at 12:25 pm
JSON-LD started out the same way — as independent from RDF. I noticed that your entire line of thinking was a near match for how the JSON-LD work began.
It was only after much argumentation that JSON-LD ended up adopting the RDF data model. It turned out that it was close enough … and that using it would satisfy various parties involved in the standardization process.
That being said, JSON-LD can be used almost entirely without any knowledge of RDF at all. It’s just about linking decentralized (JSON) data out there on the web in an easy way. Existing systems can just “attach a context” and then smarter clients can immediately use it to determine how the data meshes with other data out there — without interfering with older clients.
If you haven’t read it already, Manu elaborates a bit on JSON-LD’s origins here: http://manu.sporny.org/2014/json-ld-origins-2/
July 27, 2015 at 5:43 pm
Slight delay in my reply, sorry. 🙂
Dave, the problem with RDF (and thus JSON-LD) is that it only achieves its combination of interoperability and extensibility when people agree on which URIs to use for each concept. If people want to have interoperability, they are forced to cede authority and control, deferring to whoever owns that URI. After 15+ years in the Semantic Web community, I’ve become convinced that’s not going to happen in practice at the scale we need. The widest adoption vocabs we’ve seen (dublin core, foaf, schema.org, various w3.org rec-track ones) are each very interesting cases and show just how hard it is, and why it doesn’t grow naturally. GrowJSON avoids this problem.