The JsonJdbc driver allows reading JSON files via JDBC. Using SQL queries, data from a JSON file can be extracted and processed just like from a database table.
Elements in a JSON document are addressed as tables for querying. Three components are used in the query:
All paths are specified as JSON Pointer. For the syntax, see the RFC on JSON Pointer.
The following JSON document can be read, for example, using the driver:
{
"people": [
{
"id": 10,
"name": {
"fname": "Henry",
"lname": "Cavendish"
},
"profession": "scientist"
},
{
"id": 21,
"name": {
"fname": "Isaac",
"lname": "Newton"
},
"profession": "scientist"
},
{
"id": 33,
"name": {
"fname": "Doctor",
"lname": "Seltsam"
},
"profession": "scientist"
}
]
}
An SQL query over the JsonJdbc driver would look like this:
SELECT "/id"
, "/name/fname"
, "/name/lname"
, "/profession"
FROM "jsonjdbc/src/test/resources/de/softquadrat/jdbc/json/Small.json"."/people" COLUMNS("/id", "/name/fname", "/name/lname", "/profession")
Here, the filename "jsonjdbc/src/test/resources/de/softquadrat/jdbc/json/Small.json" is specified as the schema and "/people" is a JSON Pointer that leads to the "people" array in the document.
Within the records from the array, the columns "/id", "/name/fname", "/name/lname" and "/profession" are again addressed using JSON Pointers.
The result looks like this:
/id /name/fname /name/lname /profession
--- ----------- ----------- -----------
10 Henry Cavendish scientist
21 Isaac Newton scientist
33 Doctor Seltsam scientist
As usual, column aliases can be used to convert the somewhat cryptic JSON Pointers into more readable names:
SELECT "/id" AS id
, "/name/fname" AS fname
, "/name/lname" AS lname
, "/profession" AS profession
FROM "jsonjdbc/src/test/resources/de/softquadrat/jdbc/json/Small.json"."/people" COLUMNS("/id", "/name/fname", "/name/lname", "/profession")
This produces:
id fname lname profession
-- ----- ----- ----------
10 Henry Cavendish scientist
21 Isaac Newton scientist
33 Doctor Seltsam scientist
In addition to the data columns from the JSON document, the JsonJdbc driver generates additional pseudo-columns:
SELECT rownumber AS rownum
, id AS path
, "/id" AS id
, "/name/fname" AS fname
, "/name/lname" AS lname
, "/profession" AS profession
FROM "jsonjdbc/src/test/resources/de/softquadrat/jdbc/json/Small.json"."/people" COLUMNS("/id", "/name/fname", "/name/lname", "/profession")
The pseudo-column "rownumber" returns a sequential number for each record in the result. The pseudo-column "id" returns the JSON Pointer of the record in the document. The result of the example query looks like this:
rownum path id fname lname profession
------ ---- -- ----- ----- ----------
1 /people/0 10 Henry Cavendish scientist
2 /people/1 21 Isaac Newton scientist
3 /people/2 33 Doctor Seltsam scientist
Note: There is also the pseudo-column "parentid". It returns the JSON Pointer of the parent array; more on this in the next section.
An array in a JSON document can contain sub-arrays, as in the following example:
{
"people": [
{
"id": 10,
"name": {
"fname": "Henry",
"lname": "Cavendish"
},
"profession": "scientist",
"research": [
{
"area": "Chemistry"
},
{
"area": "Density of the Earth"
},
{
"area": "Electricity"
}
]
},
{
"id": 21,
"name": {
"fname": "Isaac",
"lname": "Newton"
},
"profession": "scientist",
"research": [
{
"area": "Calculus"
},
{
"area": "Optics"
},
{
"area": "Gravity"
}
]
},
{
"id": 33,
"name": {
"fname": "Doctor",
"lname": "Seltsam"
},
"profession": "scientist"
}
]
}
Sub-arrays can also be addressed as tables. The array hierarchy must be specified in the FROM clause as a comma-separated list of JSON Pointers:
SELECT "/area" AS area
FROM "jsonjdbc/src/test/resources/de/softquadrat/jdbc/json/Researcher.json"."/people,/research" COLUMNS("/area")
Here "/people" is the JSON Pointer to the first array. This array contains the elements "/people/0", "/people/1", etc. Each of these elements can again contain a sub-array with the relative path "/research".
The query produces the following result:
area
-----
Chemistry
Density of the Earth
Electricity
Calculus
Optics
Gravity
Note: The columns referenced via JSON Pointer always navigate to elements of the sub-array "/research". The elements of the parent array "/people" cannot be addressed in the same SELECT clause.
Unfortunately, this means there is no direct correlation of the found records to the parent table or the parent JSON array. The pseudo-column "parentid" helps here:
SELECT "parentid" AS parentid
,"/area" AS area
FROM "jsonjdbc/src/test/resources/de/softquadrat/jdbc/json/Researcher.json"."/people,/research" COLUMNS("/area")
Now the found rows can be correlated with the parent table.
parentid area
-------- ----
/people/0 Chemistry
/people/0 Density of the Earth
/people/0 Electricity
/people/1 Calculus
/people/1 Optics
/people/1 Gravity
The values of the pseudo-column "parentid" in the sub-array correspond to the values of the pseudo-column "id" in the parent array above. This allows the following approach for loading a JSON document into database tables:
Instead of navigating to a JSON array, navigation to a single JSON object via JSON Pointer is also possible. In this case, the JsonJdbc driver generates an artificial array containing only this one object and performs the subsequent selection on this array. As a result, it returns the object and its elements (columns) as the single found record.
Here is a simple example. For this JSON file
{
"fname": "John",
"lname": "Doe"
}
this query
SELECT "/fname"
, "/lname"
FROM "src/test/resources/de/softquadrat/jdbc/json/ObjectTest.json"."/" COLUMNS("/fname", "/lname")
produces this result
/fname /lname
------ ------
John Doe
The JsonJdbc driver supports directories as system tables. This allows searching for files in a directory in a loop query of a datasqill module, which can then be loaded.
The following query uses the "files()" function as a system table and its columns "directory" and "filename".
SELECT directory, filename
FROM "jsonjdbc/src/test/resources/de/softquadrat/jdbc/json".files()
WHERE filename = 'JsonValidate.json' or filename = 'PeopleTest.json'
As a result, the searched files in the used directory are returned:
directory filename
--------- --------
jsonjdbc/src/test/resources/de/softquadrat/jdbc/json JsonValidate.json
jsonjdbc/src/test/resources/de/softquadrat/jdbc/json PeopleTest.json
The JsonJdbc driver returns all results as text. Conversion to other data types must be performed in downstream processing steps.
If a query returns no records, it should be verified whether the JSON Pointers in the FROM clause actually navigate to a JSON array or JSON object. If this is not the case, the JsonJdbc driver ignores the query and returns no results.
The JsonJdbc driver returns the content of selected columns in the SELECT clause. If the element addressed by JSON Pointer does not exist or is null, a null value is returned.
{
"test" : [
{
"id": 1,
"name": "Test1"
}
{
"id": 2,
"name": null
}
]
}
SELECT "/id"
, "/name"
, "/illegal"
FROM "test.json"."/test" COLUMNS("/id", "/name", "/illegal")
/id /name /illegal
--- ----- --------
1 Test1 null
2 null null
If the element addressed by JSON Pointer in the SELECT clause is not "null", not a boolean value, not a number and not a string, but an object or an array, the found JSON substring is returned as the result:
{
"test" : [
{
"id": 1,
"name": "Test1",
"child": {
"date": "17.3.2022",
"description": "only a test"
}
},
{
"id": 2,
"name": "Test2",
"child": [
{
"col1": "One",
"col2": "Two"
}
]
}
]
}
SELECT "/id"
, "/name"
, "/child"
FROM "test.json"."/test" COLUMNS("/id", "/name", "/child")
/id /name /child
--- ----- ------
1 Test1 {"date":"17.3.2022","description":"only a test"}
2 Test2 [{"col1":"One","col2":"Two"}]
The JSON Pointer syntax carefully distinguishes "" and "/". While "" navigates to the entire document, "/" addresses the element with the key "" in the JSON document.
Clear? Here is a concrete example from the RFC:
{
"foo": ["bar", "baz"],
"": 0
}
The following JSON Pointers navigate to the following elements in the tree:
"/foo" -> ["bar", "baz"]
"/foo/0" -> "bar"
"/" -> 0
"" -> {"foo": ["bar", "baz"], "": 0}
Because the SQL parser of the Jdbcjdbc driver does not support empty columns in the SELECT clause and empty tables in the FROM clause, it uses "/" for the entire document. So, in deviation from the JSON Pointer standard, "/" must be used for navigation to the entire document in the FROM clause and for selection of the entire document in the SELECT clause:
[
{ "name": { "fname": "John", "lname": "Doe"} },
{ "name": { "fname": "Jane", "lname": "Roberts"} }
]
SELECT "/name/fname" AS fname
, "/name/lname" AS lname
, "/" AS json
FROM "jsonjdbc/src/test/resources/de/softquadrat/jdbc/json/ArrayTest.json"."/" COLUMNS("/name/fname","/name/lname","/")
Here, the entire JSON document is addressed with "/" in the FROM clause. Furthermore, with "/" in the SELECT clause, the JSON document of the respective array element is returned.
fname lname json
----- ----- ----
John Doe {"name":{"fname":"John","lname":"Doe"}}
Jane Roberts {"name":{"fname":"Jane","lname":"Roberts"}}
This deviation has the consequence that elements with empty key "" in a JSON document cannot be addressed (FROM clause) or selected (SELECT clause) via JSON Pointer.
If you have a JSON file with an object, the whole object can be selected as a JSON string:
{
"id": 1,
"name": "Test1",
"child": {
"date": "17.3.2022",
"description": "only a test"
}
}
This SQL navigates to the root node of the document and selects the entire JSON object as a string:
SELECT "/" AS json
FROM "test3.json"."/" COLUMNS("/")
This produces this result:
json
----
{"id":1,"name":"Test1","child":{"date":"17.3.2022","description":"only a test"}}
Attention: If the root element in the JSON file is an array, this array is treated as a table and the individual elements in the array are returned. In this case, the entire JSON (including the square brackets []) cannot be selected.
This means that this JSON file
[
{
"id": 1,
"name": "Test1",
"child": {
"date": "17.3.2022",
"description": "only a test"
}
}
]
produces the same result with the same SQL as the previous example with the simple object. In other words, the two examples are equivalent because the JsonJdbc driver supplements the array independently in the first example (see section JSON Objects).