JsonJdbc Driver

Introduction

The JsonJdbc driver allows reading JSON files via JDBC. Using SQL queries, data from a JSON file can be extracted and processed just like from a database table.

Elements in a JSON document are addressed as tables for querying. Three components are used in the query:

a dummy connection 'jdbc:json:' is used as the database
the schema is a JSON file (in the FROM clause)
the table is the path in the JSON to a JSON array or a single JSON object (in the FROM clause)
the columns are paths to the respective elements within the JSON node

All paths are specified as JSON Pointer. For the syntax, see the RFC on JSON Pointer.

The following JSON document can be read, for example, using the driver:

{
    "people": [
        {
            "id": 10,
            "name": {
                "fname": "Henry",
                "lname": "Cavendish"
            },
            "profession": "scientist"
        },
        {
            "id": 21,
            "name": {
                "fname": "Isaac",
                "lname": "Newton"
            },
            "profession": "scientist"
        },
        {
            "id": 33,
            "name": {
                "fname": "Doctor",
                "lname": "Seltsam"
            },
            "profession": "scientist"
        }
    ]
}

An SQL query over the JsonJdbc driver would look like this:

SELECT  "/id"
    , "/name/fname"
    , "/name/lname"
    , "/profession"
FROM "jsonjdbc/src/test/resources/de/softquadrat/jdbc/json/Small.json"."/people" COLUMNS("/id", "/name/fname", "/name/lname", "/profession")

Here, the filename "jsonjdbc/src/test/resources/de/softquadrat/jdbc/json/Small.json" is specified as the schema and "/people" is a JSON Pointer that leads to the "people" array in the document.

Within the records from the array, the columns "/id", "/name/fname", "/name/lname" and "/profession" are again addressed using JSON Pointers.

The result looks like this:

/id /name/fname /name/lname /profession
--- ----------- ----------- ----------- 
10  Henry       Cavendish   scientist   
21  Isaac       Newton      scientist   
33  Doctor      Seltsam     scientist

Aliases

As usual, column aliases can be used to convert the somewhat cryptic JSON Pointers into more readable names:

SELECT  "/id" AS id
    , "/name/fname" AS fname
    , "/name/lname" AS lname
    , "/profession" AS profession
FROM "jsonjdbc/src/test/resources/de/softquadrat/jdbc/json/Small.json"."/people" COLUMNS("/id", "/name/fname", "/name/lname", "/profession")

This produces:

id fname lname profession
-- ----- ----- ---------- 
10 Henry Cavendish scientist 
21 Isaac Newton scientist 
33 Doctor Seltsam scientist

Pseudo-Columns

In addition to the data columns from the JSON document, the JsonJdbc driver generates additional pseudo-columns:

rownumber
id
parentid

SELECT rownumber AS rownum
    , id AS path
    , "/id" AS id
    , "/name/fname" AS fname
    , "/name/lname" AS lname
    , "/profession" AS profession
FROM "jsonjdbc/src/test/resources/de/softquadrat/jdbc/json/Small.json"."/people" COLUMNS("/id", "/name/fname", "/name/lname", "/profession")

The pseudo-column "rownumber" returns a sequential number for each record in the result. The pseudo-column "id" returns the JSON Pointer of the record in the document. The result of the example query looks like this:

rownum path id fname lname profession
------ ---- -- ----- ----- ---------- 
1      /people/0 10 Henry Cavendish scientist 
2      /people/1 21 Isaac Newton scientist 
3      /people/2 33 Doctor Seltsam scientist

Note: There is also the pseudo-column "parentid". It returns the JSON Pointer of the parent array; more on this in the next section.

Sub-Arrays

An array in a JSON document can contain sub-arrays, as in the following example:

{
    "people": [
        {
            "id": 10,
            "name": {
                "fname": "Henry",
                "lname": "Cavendish"
            },
            "profession": "scientist",
            "research": [
                {
                    "area": "Chemistry"
                },
                {
                    "area": "Density of the Earth"
                },
                {
                    "area": "Electricity"
                }
            ]
        },
        {
            "id": 21,
            "name": {
                "fname": "Isaac",
                "lname": "Newton"
            },
            "profession": "scientist",
            "research": [
                {
                    "area": "Calculus"
                },
                {
                    "area": "Optics"
                },
                {
                    "area": "Gravity"
                }
            ]
        },
        {
            "id": 33,
            "name": {
                "fname": "Doctor",
                "lname": "Seltsam"
            },
            "profession": "scientist"
        }
    ]
}

Sub-arrays can also be addressed as tables. The array hierarchy must be specified in the FROM clause as a comma-separated list of JSON Pointers:

SELECT "/area" AS area
FROM "jsonjdbc/src/test/resources/de/softquadrat/jdbc/json/Researcher.json"."/people,/research" COLUMNS("/area")

Here "/people" is the JSON Pointer to the first array. This array contains the elements "/people/0", "/people/1", etc. Each of these elements can again contain a sub-array with the relative path "/research".

The query produces the following result:

area
----- 
Chemistry 
Density of the Earth 
Electricity 
Calculus 
Optics 
Gravity

Note: The columns referenced via JSON Pointer always navigate to elements of the sub-array "/research". The elements of the parent array "/people" cannot be addressed in the same SELECT clause.

Unfortunately, this means there is no direct correlation of the found records to the parent table or the parent JSON array. The pseudo-column "parentid" helps here:

SELECT "parentid" AS parentid
    ,"/area" AS area
FROM "jsonjdbc/src/test/resources/de/softquadrat/jdbc/json/Researcher.json"."/people,/research" COLUMNS("/area")

Now the found rows can be correlated with the parent table.

parentid area
-------- ---- 
/people/0 Chemistry 
/people/0 Density of the Earth 
/people/0 Electricity 
/people/1 Calculus 
/people/1 Optics 
/people/1 Gravity

The values of the pseudo-column "parentid" in the sub-array correspond to the values of the pseudo-column "id" in the parent array above. This allows the following approach for loading a JSON document into database tables:

First, read the root array that is highest in the hierarchy into a database table. Store the pseudo-column "id" as the primary key.
Then, read the directly subordinate sub-array into a child table. Use "parentid" as the foreign key; "id" can again be used as the primary key.
If there are further sub-arrays under the found sub-array, continue the process iteratively until all desired data is written to the corresponding tables.

JSON Objects

Instead of navigating to a JSON array, navigation to a single JSON object via JSON Pointer is also possible. In this case, the JsonJdbc driver generates an artificial array containing only this one object and performs the subsequent selection on this array. As a result, it returns the object and its elements (columns) as the single found record.

Here is a simple example. For this JSON file

{
  "fname": "John",
  "lname": "Doe"
}

this query

SELECT "/fname"
    , "/lname"
    FROM "src/test/resources/de/softquadrat/jdbc/json/ObjectTest.json"."/" COLUMNS("/fname", "/lname")

produces this result

/fname /lname
------ ------ 
John   Doe

System Tables

The JsonJdbc driver supports directories as system tables. This allows searching for files in a directory in a loop query of a datasqill module, which can then be loaded.

The following query uses the "files()" function as a system table and its columns "directory" and "filename".

SELECT directory, filename
    FROM "jsonjdbc/src/test/resources/de/softquadrat/jdbc/json".files()
    WHERE filename = 'JsonValidate.json' or filename = 'PeopleTest.json'

As a result, the searched files in the used directory are returned:

directory filename
--------- -------- 
jsonjdbc/src/test/resources/de/softquadrat/jdbc/json JsonValidate.json 
jsonjdbc/src/test/resources/de/softquadrat/jdbc/json PeopleTest.json

Notes

Data Types

The JsonJdbc driver returns all results as text. Conversion to other data types must be performed in downstream processing steps.

Empty Result

If a query returns no records, it should be verified whether the JSON Pointers in the FROM clause actually navigate to a JSON array or JSON object. If this is not the case, the JsonJdbc driver ignores the query and returns no results.

Null Values

The JsonJdbc driver returns the content of selected columns in the SELECT clause. If the element addressed by JSON Pointer does not exist or is null, a null value is returned.

{
    "test" : [
        {
            "id": 1,
            "name": "Test1"
        }
        {
            "id": 2,
            "name": null
        }
    ]
}

SELECT "/id"
    , "/name"
    , "/illegal"
FROM "test.json"."/test" COLUMNS("/id", "/name", "/illegal")

/id /name /illegal
--- ----- -------- 
1   Test1 null     
2   null  null

Reading a JSON Substring

If the element addressed by JSON Pointer in the SELECT clause is not "null", not a boolean value, not a number and not a string, but an object or an array, the found JSON substring is returned as the result:

{
    "test" : [
        {
            "id": 1,
            "name": "Test1",
            "child": {
                "date": "17.3.2022",
                "description": "only a test"
            }
        },
        {
            "id": 2,
            "name": "Test2",
            "child": [
                {
                    "col1": "One",
                    "col2": "Two"
                }
            ]
        }
    ]
}

SELECT "/id"
    , "/name"
    , "/child"
FROM "test.json"."/test" COLUMNS("/id", "/name", "/child")

/id /name /child
--- ----- ------ 
1   Test1 {"date":"17.3.2022","description":"only a test"} 
2   Test2 [{"col1":"One","col2":"Two"}]

JSON Pointer Limitations

The JSON Pointer syntax carefully distinguishes "" and "/". While "" navigates to the entire document, "/" addresses the element with the key "" in the JSON document.

Clear? Here is a concrete example from the RFC:

{
  "foo": ["bar", "baz"],
  "": 0
}

The following JSON Pointers navigate to the following elements in the tree:

"/foo" -> ["bar", "baz"]
"/foo/0" -> "bar"
"/" -> 0
""  -> {"foo": ["bar", "baz"], "": 0}

Because the SQL parser of the Jdbcjdbc driver does not support empty columns in the SELECT clause and empty tables in the FROM clause, it uses "/" for the entire document. So, in deviation from the JSON Pointer standard, "/" must be used for navigation to the entire document in the FROM clause and for selection of the entire document in the SELECT clause:

[
  { "name": { "fname": "John", "lname": "Doe"} },
  { "name": { "fname": "Jane", "lname": "Roberts"} }
]

SELECT "/name/fname" AS fname
    , "/name/lname" AS lname
    , "/" AS json
FROM "jsonjdbc/src/test/resources/de/softquadrat/jdbc/json/ArrayTest.json"."/" COLUMNS("/name/fname","/name/lname","/")

Here, the entire JSON document is addressed with "/" in the FROM clause. Furthermore, with "/" in the SELECT clause, the JSON document of the respective array element is returned.

fname lname json
----- ----- ---- 
John  Doe   {"name":{"fname":"John","lname":"Doe"}} 
Jane  Roberts {"name":{"fname":"Jane","lname":"Roberts"}}

This deviation has the consequence that elements with empty key "" in a JSON document cannot be addressed (FROM clause) or selected (SELECT clause) via JSON Pointer.

Reading Entire JSON

If you have a JSON file with an object, the whole object can be selected as a JSON string:

{
  "id": 1,
  "name": "Test1",
  "child": {
    "date": "17.3.2022",
    "description": "only a test"
  }
}

This SQL navigates to the root node of the document and selects the entire JSON object as a string:

SELECT "/" AS json
FROM "test3.json"."/" COLUMNS("/")

This produces this result:

json
---- 
{"id":1,"name":"Test1","child":{"date":"17.3.2022","description":"only a test"}}

Attention: If the root element in the JSON file is an array, this array is treated as a table and the individual elements in the array are returned. In this case, the entire JSON (including the square brackets []) cannot be selected.

This means that this JSON file

[
    {
        "id": 1,
        "name": "Test1",
        "child": {
            "date": "17.3.2022",
            "description": "only a test"
        }
    }
]

produces the same result with the same SQL as the previous example with the simple object. In other words, the two examples are equivalent because the JsonJdbc driver supplements the array independently in the first example (see section JSON Objects).