The YamlJdbc driver allows reading YAML files via JDBC. Using SQL queries, data from a YAML file can be extracted and processed just like from a database table.
Sequences (lists) in a YAML document are addressed as tables for querying. Three components are used in the query:
The following YAML document can be read, for example, using the driver:
people:
- id: 10
profession: scientist
fname: Henry
lname: Cavendish
- id: 21
profession: scientist
fname: Isaac
lname: Newton
- id: 33
profession: scientist
fname: Doctor
lname: Seltsam
An SQL query over the YamlJdbc driver would look like this:
SELECT fname
, lname
, profession
FROM "yamljdbc/src/test/resources/de/softquadrat/jdbc/yaml/people.yaml"."people"
COLUMNS(fname, lname, profession)
Here, the filename "yamljdbc/src/test/resources/de/softquadrat/jdbc/yaml/people.yaml" is specified as the schema and "people" is a path expression that leads to the "people" sequence in the document.
Within the "people" sequence, the keys "fname", "lname" and "profession" of the mapping elements are addressed.
The result looks like this:
fname lname profession
------ --------- ----------
Henry Cavendish scientist
Isaac Newton scientist
Doctor Seltsam scientist
As usual, column aliases can be used to rename the columns:
SELECT fname AS firstname
, lname AS lastname
, profession AS profession
FROM "yamljdbc/src/test/resources/de/softquadrat/jdbc/yaml/people.yaml"."people"
COLUMNS(fname, lname, profession)
This produces:
firstname lastname profession
------- --------- ---------
Henry Cavendish scientist
Isaac Newton scientist
Doctor Seltsam scientist
In addition to the data columns from the YAML document, the YamlJdbc driver generates additional pseudo-columns:
SELECT rownumber AS rownum
, "$id" AS row_id
, fname
, lname
, profession
FROM "yamljdbc/src/test/resources/de/softquadrat/jdbc/yaml/people.yaml"."people"
COLUMNS(fname, lname, profession)
The pseudo-column "rownumber" returns a sequential number for each record in the result. The pseudo-column "$id" returns a sequential number for the element in the document (starting at 0). The result of the example query looks like this:
rownum row_id fname lname profession
------ --------- ------ --------- ----------
1 0 Henry Cavendish scientist
2 1 Isaac Newton scientist
3 2 Doctor Seltsam scientist
Note: There is also the pseudo-column "$parentid". It returns the path of the parent element; more on this in the next section.
An element in a YAML document can contain further nested structures, as in the following example:
people:
- id: 10
profession: scientist
fname: Henry
lname: Cavendish
research:
- area: Chemistry
- area: Density of the Earth
- area: Electricity
- id: 21
profession: scientist
fname: Isaac
lname: Newton
research:
- area: Calculus
- area: Optics
- area: Gravity
Nested structures can be read in two ways:
For simple nested structures, dot notation can be used:
company:
name: Acme Corp
employees:
- name: John Doe
department: IT
salary: 75000
- name: Jane Smith
department: HR
salary: 65000
SELECT name, department, salary
FROM "employees.yaml"."company.employees"
COLUMNS(name, department, salary)
The dot notation "company.employees" navigates through the nested structure directly to the "employees" sequence.
This produces:
name department salary
---------- ---------- ------
John Doe IT 75000
Jane Smith HR 65000
This is the simplest approach, but the reference to the parent elements is missing.
The YamlJdbc driver also supports comma-separated navigation, similar to JsonJdbc and XmlJdbc. This is particularly useful for nested lists:
SELECT area
FROM "yamljdbc/src/test/resources/de/softquadrat/jdbc/yaml/researcher.yaml"."people,research"
COLUMNS(area)
Here "people" navigates to the first hierarchy level (list of persons) and "research" (after the comma) to the second level relative to each found "people" element.
This returns all "research" entries of all persons:
area
--------------------
Chemistry
Density of the Earth
Electricity
Calculus
Optics
Gravity
With the pseudo-column "$parentid" you can see the relationship:
SELECT "$parentid", area
FROM "yamljdbc/src/test/resources/de/softquadrat/jdbc/yaml/researcher.yaml"."people,research"
COLUMNS(area)
This produces:
$parentid area
--------- --------------------
0 Chemistry
0 Density of the Earth
0 Electricity
1 Calculus
1 Optics
1 Gravity
The "$parentid" shows which "people" element the "research" entries belong to (0 = first element, 1 = second element, etc.).
The pseudo-column "$parentid" enables the correlation between nested elements and their parent elements.
Best practice for nested hierarchies:
SELECT "$id", fname, lname
FROM "researcher.yaml"."people"
COLUMNS(fname, lname)
This produces:
$id fname lname
--- ------ ---------
0 Henry Cavendish
1 Isaac Newton
SELECT "$parentid", area
FROM "researcher.yaml"."people,research"
COLUMNS(area)
This produces:
$parentid area
--------- --------------------
0 Chemistry
0 Density of the Earth
0 Electricity
1 Calculus
1 Optics
1 Gravity
Dot notation (company.employees):
Comma separation (people,research):
Example of the difference:
data:
level1:
- item: A
level2:
- value: 1
- value: 2
-- Dot notation: Direct to level2 (if you don't need the hierarchy)
FROM "file.yaml"."data.level1.level2"
-- Comma separation: Preserves hierarchy information
FROM "file.yaml"."data.level1,level2"
The YamlJdbc driver supports directories as system tables. This allows searching for files in a directory in a loop query of a datasqill module, which can then be loaded.
The following query uses the "files()" function as a system table and its columns "directory" and "filename".
SELECT directory, filename
FROM "yamljdbc/src/test/resources/de/softquadrat/jdbc/yaml".files()
WHERE filename LIKE '%.yaml'
ORDER BY filename
As a result, the searched files in the used directory are returned:
directory filename
--------------------------------------------------- --------------
yamljdbc/src/test/resources/de/softquadrat/jdbc/yaml books.yaml
yamljdbc/src/test/resources/de/softquadrat/jdbc/yaml employees.yaml
yamljdbc/src/test/resources/de/softquadrat/jdbc/yaml people.yaml
yamljdbc/src/test/resources/de/softquadrat/jdbc/yaml researcher.yaml
The YamlJdbc driver returns all results as text. Conversion to other data types must be performed in downstream processing steps.
YAML supports various data types (strings, numbers, booleans, null), but the driver converts all values to strings for the JDBC output.
If a query returns no records, it should be verified whether the path expression in the FROM clause actually navigates to a YAML sequence. If this is not the case, the YamlJdbc driver ignores the query and returns no results.
The YamlJdbc driver returns the content of selected columns in the SELECT clause. If the addressed key does not exist, a null value is returned.
items:
- id: 1
name: Test1
- id: 2
# name missing
SELECT id
, name
, description
FROM "test.yaml"."items"
COLUMNS(id, name, description)
id name description
-- ----- -----------
1 Test1 null
2 null null
The YamlJdbc driver currently does not support YAML anchors (&) and aliases (*). These are resolved by the YAML parser (SnakeYAML) before the data is processed.
When a value itself is a complex structure (nested mapping or sequence), it is returned as a string representation:
items:
- id: 1
name: Test
config:
setting1: value1
setting2: value2
SELECT id, name, config
FROM "test.yaml"."items"
COLUMNS(id, name, config)
The "config" field is returned as a string representation of the nested object (e.g. {setting1=value1, setting2=value2}).
The YamlJdbc driver supports various YAML formats:
Sequence at root level:
- id: 1
name: Item 1
- id: 2
name: Item 2
Usage: FROM "file.yaml"."/"
Named sequence:
items:
- id: 1
name: Item 1
- id: 2
name: Item 2
Usage: FROM "file.yaml"."items"
Single mapping as sequence: The driver also treats a single mapping element as a single-row sequence:
person:
id: 1
name: John
Usage: FROM "file.yaml"."person"
This returns a single row.
The YamlJdbc driver uses:
For optimal performance with large YAML files:
Since YAML files are fully loaded into memory:
Note: YAML is optimized for smaller configuration files. For very large data volumes, formats like JSON or CSV are more efficient.
The YamlJdbc driver uses UTF-8 as the default character set for reading YAML files. This is the recommended standard for YAML documents.
books:
- id: 1
category: fiction
title: The Great Gatsby
author: F. Scott Fitzgerald
year: 1925
price: 10.99
- id: 2
category: fiction
title: To Kill a Mockingbird
author: Harper Lee
year: 1960
price: 12.99
SELECT category, title, author, year, price
FROM "books.yaml"."books"
COLUMNS(category, title, author, year, price)
WHERE category = 'fiction'
company:
name: Acme Corp
departments:
- name: IT
employees:
- name: John Doe
salary: 75000
- name: Jane Smith
salary: 80000
- name: HR
employees:
- name: Bob Johnson
salary: 65000
-- All employees with department info
SELECT "$parentid" AS dept_id, name, salary
FROM "company.yaml"."company.departments,employees"
COLUMNS(name, salary)
servers:
- hostname: web01
ip: 192.168.1.10
role: webserver
active: true
- hostname: db01
ip: 192.168.1.20
role: database
active: true
- hostname: backup01
ip: 192.168.1.30
role: backup
active: false
SELECT hostname, ip, role
FROM "servers.yaml"."servers"
COLUMNS(hostname, ip, role, active)
WHERE active = 'true'
The YamlJdbc driver offers a simple and intuitive way to read YAML files via SQL queries:
The driver is particularly suitable for: