The module creates a history of a database so that versions that existed at certain points in time can be traced. For this it needs the source query to be historized and a target table in which the change history is recorded.
See Data historization / versioning
| Name | Meaning |
|---|---|
| Module | History |
| Module Class | DsModTemplateSQL |
| Type | Java |
| Purpose | Based on a defined primary key, the data of the specified SQL query should be historically integrated into the target table |
| Transformation Code | SQL query |
| Sources | Source tables in a database |
| Targets | Target table in the same database |
This module enables integrating data from a current view into a table with a historical view.
The result of the specified query is added to the periods of the target table.
For this, a technical or business key must be specified that is used as the basis for historically correct period formation. The key of the source query plus the period (start and/or end, since there is no period data type in databases) form the technical key of the target table.
The following example shows how to build the history of source information with the help of the module. First, the SQL query used:
SELECT r.r_regionkey AS region_key
, r.r_name region_name
, n.n_nationkey AS nation_key
, n.n_name nation_name
FROM stage.region r
, stage.nation n
WHERE 1 = 1
AND n.n_regionkey = r.r_regionkey
The target table has the following columns:
| Column |
|---|
| region_key |
| region_name |
| nation_key |
| nation_name |
| valid_from_dt |
| invalid_from_dt |
The column region_key is specified as the key.
Now when this transformation is executed regularly (typically daily), the current data from the source query is integrated into the target table. If nothing changes for a primary key, the data for this primary key in the target table also does not change. If data for a primary key changes, the current period is closed and a new period with the changed data is created.
The number of generated SQL statements (Insert / Update or Merge) to perform the historization depends on the chosen configuration (with/without delete flag, with/without delete, with active indicator) and the database used. Currently there are only implementations for Exasol, PostgreSQL and Oracle.
The sources of the module are tables that are read via the query defined by the datasqill developer. For these tables, the datasqill runtime user must have read rights. All source tables used in the SQL query must be connected to the module input in the datasqill GUI in the graphical data model.
The target of the module is a single table. The table must be in the same database as the source tables. For this, the datasqill runtime user must have write rights. The target table must be connected to the module output in the datasqill GUI in the graphical data model.
In the delivered state, the module offers the following attributes in the GUI for the datasqill developer:
| Name | Type | Meaning |
|---|---|---|
| Join Key Columns | Required Column List | The text "RL:" followed by a comma-separated list of columns to be used for the join condition. By default it says "RL:${tgt_pk_list}". The variable tgt_pk_list is replaced by the list of primary key columns of the target table |
| With Delete | Boolean | If enabled, then all data in the target table that is no longer present in the source query is logically deleted |
| Ignore Columns | Optional Column List | The text "OL:" followed by a comma-separated list of columns to be ignored during comparison and only added to the target table when other columns change |
| Inplace Columns | Optional Column List | The text "OL:" followed by a comma-separated list of columns to be ignored during comparison and overwritten in the current period (inplace) without historization when changed |
| Skip if empty | Boolean | If enabled, nothing is executed when the source is empty. Ignored when "Error if empty" is set |
| Error if empty | Boolean | If enabled, an error is generated |
When installing the module, additional attributes can be configured
The module can be extended so that the "Loop Query" attribute is activated (see configuring module attributes in the appendix):
| Name | Type | Meaning |
|---|---|---|
| Loop Query | SQL | Loop query to use the results in the main query. |
Then you have an additional attribute in which you can enter a query. When a loop query is captured, the module executes the main query for each record found. The column results can be used as variables.
It only makes sense to use this loop query to suppress execution of the transformation (if).
An example of using the loop query as an IF query:
SELECT DISTINCT 1
FROM <source-table>
WHERE <last-change-date> >= <today>-7
You get one row if something has changed in the last 7 days, otherwise none. Thus the main query is executed once or not at all.
In the main query you can use the column names of the loop query as variables. Note that the variables are each inserted as text (string).
See examples at the Insert module.
You can define technical columns that influence the behavior of the History module:
| Name | Value | Meaning | Description |
|---|---|---|---|
| valid_from_dt | RC:valid_from_dt | Period valid from | Set to current timestamp current_time when inserting a period |
| invalid_from_dt | RC:invalid_from_dt | Period valid to (exclusive) | Set to the value of end_of_time when inserting a new period. Set to current timestamp current_time when closing a period |
| current_time | ${string_to_datetime(module_utc_start_datetime)} | This value defines the current timestamp | In delivered state the start time of the module in date-time (without milliseconds) |
| end_of_time | ${string_to_datetime('9999-12-31 00:00:00')} | The value is used for the end of the last open period | In delivered state it is 31.12.9999 |
| is_current_and_active | OC:is_current_and_active | Column name for the current and active period | If set and available in the target table, the period that is current and also active is set to active, otherwise to inactive |
| is_current_and_active_true | Y | Active | Value with which the column is_current_and_active is to be marked as active |
| is_current_and_active_false | N | Inactive | Value with which the column is_current_and_active is to be marked as inactive |
| is_latest_period | OC:is_latest_period | Column name for the latest period | If set and available in the target table, the period that is current (whether active or deleted) is set to active, otherwise to inactive |
| is_latest_period_true | Y | Active | Value with which the column is_latest_period is to be marked as active |
| is_latest_period_false | N | Inactive | Value with which the column is_latest_period is to be marked as inactive |
| is_deleted | OC:is_deleted | Column name for the deletion indicator | If the column is available in the target table, the flag is set to inactive when the period is a deleted time range. Otherwise to active |
| is_deleted_true | Y | Active | Value with which the column is_deleted is to be marked as active |
| is_deleted_false | N | Inactive | Value with which the column is_deleted is to be marked as inactive |
| ins_by | OC:ins_by | Inserted by | Not changed on update. Set to current modifier current_by when inserting a row |
| upd_by | OC:upd_by | Modified by | Filled with the current modifier current_by |
| last_changed_dt | OC:last_changed_dt | Modified at | Filled with the current timestamp current_time |
| current_by | ${request_id} | This value defines the current modifier | In delivered state the Request ID of the datasqill action execution |
The column names used (under Value) can be adapted to your environment. See also configuring module attributes in the appendix.
If the target table contains some or several of these technical columns, they are filled according to the table above.