fsteeg.com | notes | tags

∞ /notes/from-rdf-to-json-with-json-ld 🖇 | 2018-12-22 | web programming

From RDF to JSON with JSON-LD

In a workshop at SWIB18, we recently shared our approach to making existing RDF data usable in contexts where JSON is expected. I think it’s an accessible way to use RDF data for search, visualization, and integration with existing software. This post is a short tutorial for the first part of that workshop. Check out the full workshop slides for details, and the setup instructions if you want to follow the examples.

JSON for web APIs

So the basic idea is that JSON is what we want if we provide data on the web, e.g.:

$ curl https://api.github.com/repos/hbz/swib18-workshop
{
  "id": 150073510,
  "node_id": "MDEwOlJlcG9zaXRvcnkxNTAwNzM1MTA=",
  "name": "swib18-workshop",
  "full_name": "hbz/swib18-workshop",
  "private": false,
  "owner": {
    "login": "hbz",
    "id": 6557108,
    ...

We can easily access fields like owner.login:

$ curl https://api.github.com/repos/hbz/swib18-workshop | jq .owner.login
...
"hbz"

Serialized RDF as JSON-LD

But what if our source data is RDF? That’s where JSON-LD comes in, which is both usable as an RDF serialization (great for us), and as plain JSON, without knowing about RDF (great for users). So let’s convert some RDF to JSON-LD (using works from the Library of Congress):

$ jsonld import data.nt > loc.json
...
{
 "@id": "http://id.loc.gov/resources/works/c000101650",
 "http://id.loc.gov/ontologies/bibframe/subject": [{
  "@id": "http://id.loc.gov/resources/works/101650#Topic650-20"
 }, ...]
}, {
 "@id": "http://id.loc.gov/resources/works/101650#Topic650-20",
 "@type": [
  "http://id.loc.gov/ontologies/bibframe/Topic",
  "http://www.loc.gov/mads/rdf/v1#ComplexSubject"
 ],
 "http://www.w3.org/2000/01/rdf-schema#label": [{
  "@value": "Climatic changes--Europe."
 }],...
}

Hm, well yes, JSON, kind of. But no, this is not what we want, which is accessing something like subject.label. We have just a flat array of objects, there is no way to access that specific label directly. At this point, we really have the worst of both worlds: unwieldy URIs as keys from the RDF, and syntactic overhead from JSON.

Framed JSON-LD

JSON-LD framing allows us to frame the way we look at our RDF graph from the perspective of one entity type. This is what we need to get direct access to a specific field like subject.label. The work we’re looking at and its subject are not two equally relevant entities in this view, but instead the subject is an attribute of the work. Framing turns the graph into trees, or documents. A frame is itself a JSON object and can look like this:

{
 "@type": "http://id.loc.gov/ontologies/bibframe/Work",
 "@embed": "@always"
}

Using this frame, we can frame our serialized RDF:

$ jsonld frame -f frame.json loc.json > loc-framed.json
...
{
 "@id": "http://id.loc.gov/resources/works/c000101650",
 "http://id.loc.gov/ontologies/bibframe/subject": [{
   "@id":"http://id.loc.gov/resources/works/101650#Topic650-20",
   "@type": [
     "http://id.loc.gov/ontologies/bibframe/Topic",
     "http://www.loc.gov/mads/rdf/v1#ComplexSubject"
   ],
   ...
   "http://www.w3.org/2000/01/rdf-schema#label": "Climatic changes--Europe."
 }, {...}],
}

Conceptually, this is what we want: the actual subject data is emdedded under the work’s .../subject field. But it’s still not very usable, since the JSON keys are URIs, so we can’t do something like subject.label.

Compacted JSON-LD

That’s where the JSON-LD context comes in: the context, like the frame, is itself a JSON object. It defines a mapping of JSON keys to URIs:

{
 "name": "http://schema.org/name",
 ...
}

With this context, we can compact (replace URIs with short keys) or expand (replace short keys with URIs) our JSON-LD. (For complex data, creating the context can be a major task). We’re going to use compaction to turn our URIs into nice, usable keys:

$ jsonld compact -c context.json loc-framed.json > loc-compact.json
...
{
 "@id": "http://id.loc.gov/resources/works/c000101650",
 "subject": [
  {
   "id": "http://id.loc.gov/resources/works/101650#Topic650-20",
   "type": [
    "Topic",
    "ComplexSubject"
   ],
   "label": "Climatic changes--Europe."
  }
 ], ...
}

With this final format, we can now easily access specific fields directly, like the id, type, or label of a subject (subjects are actually multiple objects stored in an array, thus the subject[] notation):

$ cat loc-compact.json | jq .subject[].label

"Climatic changes--Europe."
"Climatic changes--Social aspects--Europe."
"Civilization, Medieval."

And at this point, we actually have the best of both worlds: RDF-compatible linked data, and useful, practical JSON.

The JSON view

To add perspective, what we’ve seen here was the RDF-first approach to JSON-LD, which involved multiple steps to create usable JSON-LD: first serialize RDF as JSON-LD, then frame it, finally compact it. The JSON-first approach to JSON-LD is very different: take some useful JSON, add IDs and context:

{
  "@id": "https://github.com/users/hbz",
  "type": "Organization",
  "name": "hbz",
  "blog": "https://www.hbz-nrw.de",
  "@context": {
    "type": "@type",
    "name": "https://schema.org/name",
    "blog": { "@id": "https://schema.org/url", "@type": "@id" },
    "Organization": "https://schema.org/Organization"
  }
}

You can paste a document like above into the JSON-LD playground, where you can tweak the input and see different output formats.

Using JSON

The resulting JSON data can be used in many ways. In our workshop we indexed it in Elasticsearch, accessed it from a web app, and used it in Kibana and OpenRefine. JSON-LD provides a bridge for using RDF data in such contexts.