Data Historization / Versioning

There are various procedures that can be used to document older versions of a database as history. For example, copies of the entire database can be saved at regular intervals so that the exact version on day X can be viewed. However, this takes up a lot of storage space, since unchanged records are also constantly copied and saved.

In the following, an alternative, more efficient procedure is explained in which only the changes to the database are documented that are recognized after regular reconciliations. Time periods are assigned to the changed records during which they existed in the original database. This creates a historical change log that records every change made to the database. Required are the source query to be historized and a target table in which the change log is recorded.

Periodic Procedure for Historization

With the basic procedure of the module, the records from the source table (e.g. customer data) are transferred to a target table and supplemented there with the additional information valid_from_dt and invalid_from_dt. These indicate when the respective record was added to the source table and until when it existed there.

Example: History of the First Version

Source table after historical reconciliation on January 1, 2000

person_no name first_name date_of_birth
1 Monroe Marilyn 01.06.1926
2 Dean James 08.02.1931

Target table after historical reconciliation

person_no valid_from invalid_from name first_name date_of_birth
1 01.01.2000 31.12.9999 Monroe Marilyn 01.06.1926
2 01.01.2000 31.12.9999 Dean James 08.02.1931

At specified points in time, the current source table is compared with the version that was captured at the last specified point in time. The granularity of the reconciliation is defined by the user and can be as frequent as desired (day-, second-, millisecond-accurate). If no changes are detected during reconciliation, the target table contains the same records as the source table (plus the additional information valid_from etc.). When the system detects changes during regular reconciliation, these are written to the target table. These incongruities between the compared versions are documented in the target table as follows:

  • If an added record is detected in the source table, the module also adds a corresponding record in the target table.
  • If an existing record is modified, it is marked in the target table as an obsolete version by setting invalid_from to the point in time when the change was detected. The validity of the record is to be understood as exclusive of the specified time, because the exact time of the change cannot be determined, only the time when the change was detected. In addition, a new record is added to the target table that describes the current version containing the new information and which is active from this point in time (e.g. new contact data, correction of existing data). This takes the time when the change was detected as the valid_from value. Thus record versions in the target table are linked to associated validity periods during which they existed in the source table.

Example: Correction First Name

Source table after historical reconciliation on January 8, 2000

person_no name first_name date_of_birth
1 Monroe Marilyn 01.06.1926
2 Dean James 08.02.1931

Target table after historical reconciliation

person_no valid_from invalid_from name first_name date_of_birth
1 01.01.2000 08.01.2000 Monroe Marilyn 01.06.1926
1 08.01.2000 31.12.9999 Monroe Marilyn 01.06.1926
2 01.01.2000 31.12.9999 Dean James 08.02.1931

The procedure for deletions of records in the source table depends on whether these deletions should be detected and documented or not. If the system should detect deletions, there are two different methods to document them in the target table. Thus there are three procedural options that can be applied depending on the setting.

  1. No Documentation of Deletions

    In this mode, the system cannot detect or document deletions. A typical use case is booking documents that cannot be deleted.

    If a record is deleted, the target table remains unchanged.

Example: Person No. 2 Missing in Incremental Source

Source table on January 9, 2000

person_no name first_name date_of_birth
1 Monroe Marilyn 01.06.1926

Unchanged target table

person_no valid_from invalid_from name first_name date_of_birth
1 01.01.2000 08.01.2000 Monroe Marilyn 01.06.1926
1 08.01.2000 31.12.9999 Monroe Marilyn 01.06.1926
2 01.01.2000 31.12.9999 Dean James 08.02.1931
  1. Documentation of Deletions by Termination

    When a record is deleted in the source table in this mode, the corresponding record in the target table receives the point in time when the deletion was detected as the invalid_from value. Thus the deletion is only marked by terminating the last active version of a record.

Example: Deletion of Person No. 2 without Deletion Flag

Source table on January 10, 2000

person_no name first_name date_of_birth
1 Monroe Marilyn 01.06.1926

Target table after historical reconciliation

person_no valid_from invalid_from name first_name date_of_birth
1 01.01.2000 08.01.2000 Monroe Marilyn 01.06.1926
1 08.01.2000 31.12.9999 Monroe Marilyn 01.06.1926
2 01.01.2000 10.01.2000 Dean James 08.02.1931
  1. Documentation of Deletions by Deletion Flag

    Here the deletion is additionally documented by adding a record marked as deleted in the target table. The target table has a column that contains the information deleted: yes/no. On deletion, the last active version is terminated at the detection time (invalid_from) and marked as not deleted (is_deleted: no), since it was active until that time. In addition, a copy of the last active version is created as a new record in the target table, which is valid from the detection time of the deletion and is marked as deleted (is_deleted: yes).

Example: Deletion of Person No. 2 with Deletion Flag

Source table on January 10, 2000

person_no name first_name date_of_birth
1 Monroe Marilyn 01.06.1926

Target table after historical reconciliation

person_no valid_from invalid_from name first_name date_of_birth is_deleted
1 01.01.2000 08.01.2000 Monroe Marilyn 01.06.1926 N
1 08.01.2000 31.12.9999 Monroe Marilyn 01.06.1926 N
2 01.01.2000 10.01.2000 Dean James 08.02.1931 N
2 10.01.2000 31.12.9999 Dean James 08.02.1931 Y